Vision Researcher - Multimodal Understanding & Generation in Foundation Models

Vision Researcher – Multimodal Understanding & Generation in Foundation Models

Master’s or Ph.D. degree in Computer Science, Artificial Intelligence, Computer Vision, Machine Learning, or a related field
Proven multi-modal research experience in relevant areas
Familiarity with state-of-the-art technologies and a strong publication record in top-tier conferences or journals such as CVPR, ICCV, ECCV, NeurIPS, ICLR, or ICML
Proficiency with mainstream open-source tools and frameworks relevant to the field
Strong engineering skills to support research implementation
Strong team spirit and ability to collaborate across disciplines
Excellent communication skills
Intellectual curiosity and a goal-oriented, problem-solving mindset

Serve as a domain expert in computer vision and collaborate with researchers from other modalities
Explore the training and design of large models for understanding and generating representations of the physical world
Stay up to date with the latest advancements in academia and industry; actively participate in international conferences and workshops
Contribute impactful research outcomes to the open-source community or transfer technologies to internal product teams

Candidates with influential GitHub projects or contributions to high-impact open-source communities are preferred