Posted in

Vision Researcher – Multimodal Understanding & Generation in Foundation Models

Vision Researcher – Multimodal Understanding & Generation in Foundation Models

CompanyTencent
LocationBellevue, WA, USA
Salary$149000 – $279800
TypeFull-Time
DegreesMaster’s, PhD
Experience LevelSenior, Expert or higher

Requirements

  • Master’s or Ph.D. degree in Computer Science, Artificial Intelligence, Computer Vision, Machine Learning, or a related field
  • Proven multi-modal research experience in relevant areas
  • Familiarity with state-of-the-art technologies and a strong publication record in top-tier conferences or journals such as CVPR, ICCV, ECCV, NeurIPS, ICLR, or ICML
  • Proficiency with mainstream open-source tools and frameworks relevant to the field
  • Strong engineering skills to support research implementation
  • Strong team spirit and ability to collaborate across disciplines
  • Excellent communication skills
  • Intellectual curiosity and a goal-oriented, problem-solving mindset

Responsibilities

  • Serve as a domain expert in computer vision and collaborate with researchers from other modalities
  • Explore the training and design of large models for understanding and generating representations of the physical world
  • Stay up to date with the latest advancements in academia and industry; actively participate in international conferences and workshops
  • Contribute impactful research outcomes to the open-source community or transfer technologies to internal product teams

Preferred Qualifications

  • Candidates with influential GitHub projects or contributions to high-impact open-source communities are preferred