Posted in

Software Engineer – Accelerator Solutions & Technologies

Software Engineer – Accelerator Solutions & Technologies

CompanyMeta
LocationMenlo Park, CA, USA, New York, NY, USA
Salary$56.25 – $173000
TypeFull-Time
DegreesBachelor’s, Master’s, PhD
Experience LevelJunior, Mid Level

Requirements

  • Currently has, or is in the process of obtaining a Bachelor’s degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta.
  • Masters or PhD in Computer Science, Computer Engineering, or any other relevant technical field
  • 2+ years experience in developing C++ codebase
  • 2+ years experience in developing Python codebase
  • Understanding of performance, benchmarking measurement, and optimization on collective communications and distributed at-scale model training

Responsibilities

  • Contribute to our developer infrastructure, including simulation and HW emulation platforms, to enable performance measurement and optimization for Meta’s in-house accelerator programs
  • Understand and contribute to the collective communications library, intended to be deployed on Meta’s AI/ML superclusters.
  • Support networking and compute hardware acceleration techniques to improve ML inference and training model performance.
  • Perform architectural analysis to ensure system designs meet performance, scalability, and reliability requirements.
  • Implement simulation models for Meta’s Accelerator ASICs, develop and analyze various scenarios to evaluate data center performance and identify potential improvements.
  • Collaborate with architects and engineers to integrate simulation results into system design processes.
  • Use instruction set simulators to define performant firmware for Meta’s training/inference accelerators.
  • Collaborate with hardware and firmware teams to ensure accurate modeling and simulation of accelerator functionalities.
  • Analyze simulation results to guide firmware development and optimization efforts.

Preferred Qualifications

  • Full-stack experience and understanding of AI/HPC systems, with a focus on the application layer and performance optimizations
  • Familiarity with relevant tools, libraries, and frameworks (e.g., PyTorch, CUDA)
  • Knowledge of AI/HPC hardware requirements and specifications (e.g., configuring hardware components for AI/HPC workloads)
  • Understanding of the transport stack (e.g., RoCE) and its constraints particularly pertaining to interconnect and collective
  • Experience with SystemC