Software Engineering Manager – AI Systems Co-Design
Company | Meta |
---|---|
Location | Menlo Park, CA, USA, Bellevue, WA, USA |
Salary | $177000 – $251000 |
Type | Full-Time |
Degrees | |
Experience Level | Senior, Expert or higher |
Requirements
- Experience in leading teams working on high performance computing (HPC) and AI/ML systems, including:
- Communication libraries (e.g., NCCL, RCCL, UCC, MPI)
- GPU/ASIC-based kernel development and optimization (e.g. CUDA, ROCm)
- Distributed systems for large scale training and serving
- Systems Architecture + Performance
- Large scale distributed systems
- Experience running a large-scale program and dealing with ambiguity
Responsibilities
- Lead and support the communications team that works on collective libraries and contribute to enabling performance at scale of our inference and training of GenAI (Llama) and Ranking & Retrieval (DLRM) models
- Enable the growth of individual contributors, driving the technical roadmap along with technical leads and expand the impact of the team by growing new skill-sets and capabilities
- Lead a high performance team of engineers to deliver new capabilities and efficient compute systems for our fleet
- Technical management
- Work cross-functionally across hardware and software/services team to drive engineering efforts
Preferred Qualifications
- Experience with collective communication, e.g. one of these libraries NCCL, RCCL, Gloo, UCC, MPI
- Network architecture