Skip to content

Vision Researcher – Multimodal Understanding & Generation in Foundation Models
Company | Tencent |
---|
Location | Bellevue, WA, USA |
---|
Salary | $149000 – $279800 |
---|
Type | Full-Time |
---|
Degrees | Master’s, PhD |
---|
Experience Level | Senior, Expert or higher |
---|
Requirements
- Master’s or Ph.D. degree in Computer Science, Artificial Intelligence, Computer Vision, Machine Learning, or a related field
- Proven multi-modal research experience in relevant areas
- Familiarity with state-of-the-art technologies and a strong publication record in top-tier conferences or journals such as CVPR, ICCV, ECCV, NeurIPS, ICLR, or ICML
- Proficiency with mainstream open-source tools and frameworks relevant to the field
- Strong engineering skills to support research implementation
- Strong team spirit and ability to collaborate across disciplines
- Excellent communication skills
- Intellectual curiosity and a goal-oriented, problem-solving mindset
Responsibilities
- Serve as a domain expert in computer vision and collaborate with researchers from other modalities
- Explore the training and design of large models for understanding and generating representations of the physical world
- Stay up to date with the latest advancements in academia and industry; actively participate in international conferences and workshops
- Contribute impactful research outcomes to the open-source community or transfer technologies to internal product teams
Preferred Qualifications
- Candidates with influential GitHub projects or contributions to high-impact open-source communities are preferred