Posted in

AI and ML Performance Engineer

AI and ML Performance Engineer

CompanyNVIDIA
LocationRedmond, WA, USA
Salary$148000 – $287500
TypeFull-Time
DegreesMaster’s
Experience LevelMid Level

Requirements

  • A minimum qualification of a Master’s degree (or equivalent experience) in Computer Science, Electrical Engineering or related fields.
  • Strong background in computer architecture, roofline modeling, queuing theory and statistical performance analysis techniques.
  • Solid understanding of LLM internals (attention mechanisms, FFN structures), model parallelism and inference serving techniques.
  • 3+ years of hands-on experience in system evaluation of AI/ML workloads or performance analysis, modeling and optimizations for AI.
  • Proficiency in Python (and optionally C++) for simulator design and data analysis.
  • Growth mindset and pragmatic ‘measure, iterate, deliver’ approach.

Responsibilities

  • Develop high-fidelity performance models to prototype emerging algorithmic techniques in Generative AI to drive model-hardware co-design.
  • Design targeted optimizations for inference deployment to maximize Pareto frontier of Accuracy, Throughput and Interactivity.
  • Quantify performance benefit of targeted optimizations to prioritize features and guide future software and hardware roadmap.
  • Model end-to-end performance impact of emerging GenAI workflows – such as Agentic Pipelines, Inference-time compute scaling, etc. – to guide datacenter design and optimization.
  • Keep up with the latest DL research and collaborate with diverse teams, including DL researchers, hardware architects, and software engineers.

Preferred Qualifications

  • Comfortable defining metrics, designing experiments and visualizing large performance datasets to identify resource bottlenecks.
  • Proven track record of working in cross-functional teams, spanning algorithms, software and hardware architecture.
  • Ability to distill complex analyses into clear recommendations for both technical and non-technical stakeholders.
  • Experience with GPU computing (CUDA)
  • Experience with deep learning frameworks like PyTorch, TRT-LLM, VLLM, SGLang