Posted in

Staff Software Engineer – AI Platform – Michelangelo

Staff Software Engineer – AI Platform – Michelangelo

CompanyUber
LocationSeattle, WA, USA, San Francisco, CA, USA, Sunnyvale, CA, USA
Salary$223000 – $248000
TypeFull-Time
DegreesMaster’s, PhD
Experience LevelSenior, Expert or higher

Requirements

  • Master in relevant fields (CS, EE, Math, Stats, etc.) AND 6-years full-time Software Engineering work experience in deep learning
  • Proficiency in Python and PyTorch
  • Expertise in designing, debugging, and optimizing distributed deep learning systems
  • Working experience of distributed training in PyTorch at Scale (e.g., data parallelism, model parallelism)
  • Strong ability to translate complex DL requirements and problems into scalable solutions

Responsibilities

  • Design and build tools to empower production teams to innovate and productionize state-of-the-art deep learning models at Uber
  • Develop and maintain scalable, end-to-end deep learning training systems and frameworks
  • Ensure distributed training tools are reliable, efficient, flexible to use for new production use cases
  • Collaborate with cross-functional teams including machine learning engineers, backend engineers, data scientists, and data engineers to deliver robust ML solutions for Uber

Preferred Qualifications

  • Expertise in distributed training frameworks such as DDP, DeepSpeed, FSDP, or TorchRec
  • Familiarity with C++, Go or CUDA programming
  • Expertise in optimizing GPU/TPU training performance and data loading efficiency
  • Familiarity with large-scale distributed infrastructure tools like Ray, OpenAI Triton, PyTorch Lightning
  • Built and deployed end-to-end machine learning systems in production
  • Experience training large models (10B+ parameters), such as large recommendation systems or large language models (LLMs)
  • PhD in relevant fields (CS, EE, Math, Stats, etc.)