Staff Software Engineer – AI Platform – Michelangelo
Company | Uber |
---|---|
Location | Seattle, WA, USA, San Francisco, CA, USA, Sunnyvale, CA, USA |
Salary | $223000 – $248000 |
Type | Full-Time |
Degrees | Master’s, PhD |
Experience Level | Senior, Expert or higher |
Requirements
- Master in relevant fields (CS, EE, Math, Stats, etc.) AND 6-years full-time Software Engineering work experience in deep learning
- Proficiency in Python and PyTorch
- Expertise in designing, debugging, and optimizing distributed deep learning systems
- Working experience of distributed training in PyTorch at Scale (e.g., data parallelism, model parallelism)
- Strong ability to translate complex DL requirements and problems into scalable solutions
Responsibilities
- Design and build tools to empower production teams to innovate and productionize state-of-the-art deep learning models at Uber
- Develop and maintain scalable, end-to-end deep learning training systems and frameworks
- Ensure distributed training tools are reliable, efficient, flexible to use for new production use cases
- Collaborate with cross-functional teams including machine learning engineers, backend engineers, data scientists, and data engineers to deliver robust ML solutions for Uber
Preferred Qualifications
- Expertise in distributed training frameworks such as DDP, DeepSpeed, FSDP, or TorchRec
- Familiarity with C++, Go or CUDA programming
- Expertise in optimizing GPU/TPU training performance and data loading efficiency
- Familiarity with large-scale distributed infrastructure tools like Ray, OpenAI Triton, PyTorch Lightning
- Built and deployed end-to-end machine learning systems in production
- Experience training large models (10B+ parameters), such as large recommendation systems or large language models (LLMs)
- PhD in relevant fields (CS, EE, Math, Stats, etc.)