Skip to content

AI and ML Performance Engineer
Company | NVIDIA |
---|
Location | Redmond, WA, USA |
---|
Salary | $148000 – $287500 |
---|
Type | Full-Time |
---|
Degrees | Master’s |
---|
Experience Level | Mid Level |
---|
Requirements
- A minimum qualification of a Master’s degree (or equivalent experience) in Computer Science, Electrical Engineering or related fields.
- Strong background in computer architecture, roofline modeling, queuing theory and statistical performance analysis techniques.
- Solid understanding of LLM internals (attention mechanisms, FFN structures), model parallelism and inference serving techniques.
- 3+ years of hands-on experience in system evaluation of AI/ML workloads or performance analysis, modeling and optimizations for AI.
- Proficiency in Python (and optionally C++) for simulator design and data analysis.
- Growth mindset and pragmatic ‘measure, iterate, deliver’ approach.
Responsibilities
- Develop high-fidelity performance models to prototype emerging algorithmic techniques in Generative AI to drive model-hardware co-design.
- Design targeted optimizations for inference deployment to maximize Pareto frontier of Accuracy, Throughput and Interactivity.
- Quantify performance benefit of targeted optimizations to prioritize features and guide future software and hardware roadmap.
- Model end-to-end performance impact of emerging GenAI workflows – such as Agentic Pipelines, Inference-time compute scaling, etc. – to guide datacenter design and optimization.
- Keep up with the latest DL research and collaborate with diverse teams, including DL researchers, hardware architects, and software engineers.
Preferred Qualifications
- Comfortable defining metrics, designing experiments and visualizing large performance datasets to identify resource bottlenecks.
- Proven track record of working in cross-functional teams, spanning algorithms, software and hardware architecture.
- Ability to distill complex analyses into clear recommendations for both technical and non-technical stakeholders.
- Experience with GPU computing (CUDA)
- Experience with deep learning frameworks like PyTorch, TRT-LLM, VLLM, SGLang