Posted in

Software Engineering Manager – AI Systems Co-Design

Software Engineering Manager – AI Systems Co-Design

CompanyMeta
LocationMenlo Park, CA, USA, Bellevue, WA, USA
Salary$177000 – $251000
TypeFull-Time
Degrees
Experience LevelSenior, Expert or higher

Requirements

  • Experience in leading teams working on high performance computing (HPC) and AI/ML systems, including:
  • Communication libraries (e.g., NCCL, RCCL, UCC, MPI)
  • GPU/ASIC-based kernel development and optimization (e.g. CUDA, ROCm)
  • Distributed systems for large scale training and serving
  • Systems Architecture + Performance
  • Large scale distributed systems
  • Experience running a large-scale program and dealing with ambiguity

Responsibilities

  • Lead and support the communications team that works on collective libraries and contribute to enabling performance at scale of our inference and training of GenAI (Llama) and Ranking & Retrieval (DLRM) models
  • Enable the growth of individual contributors, driving the technical roadmap along with technical leads and expand the impact of the team by growing new skill-sets and capabilities
  • Lead a high performance team of engineers to deliver new capabilities and efficient compute systems for our fleet
  • Technical management
  • Work cross-functionally across hardware and software/services team to drive engineering efforts

Preferred Qualifications

  • Experience with collective communication, e.g. one of these libraries NCCL, RCCL, Gloo, UCC, MPI
  • Network architecture