Posted in

Solutions Architect – Generative AI Inference and Deployment

Solutions Architect – Generative AI Inference and Deployment

CompanyNVIDIA
LocationWashington, USA, Texas, USA, Santa Clara, CA, USA, Tennessee, USA, Colorado, USA
Salary$148000 – $235750
TypeFull-Time
DegreesBachelor’s, Master’s, PhD
Experience LevelSenior

Requirements

  • BS, MS, or PhD in Computer Science, Electrical/Computer Engineering, Physics, Mathematics, other Engineering or related fields (or equivalent experience)
  • 5+ years of hands-on experience with Deep Learning frameworks such as PyTorch and TensorFlow
  • Strong fundamentals in programming, optimizations, and software design, especially in Python
  • Proficiency in problem-solving and debugging skills in GPU orchestration and Multi-Instance GPU (MIG) management within Kubernetes environments
  • Experience with containerization and orchestration technologies, monitoring, and observability solutions for AI deployments
  • Strong knowledge of the theory and practice of LLM and DL inference
  • Excellent presentation, communication and collaboration skills

Responsibilities

  • Partnering with other solution architects, engineering, product and business teams. Understanding their strategies and technical needs and helping define high-value solutions
  • Dynamically engaging with developers, scientific researchers, and data scientists, gaining experience across a range of technical areas
  • Strategically partnering with lighthouse customers and industry-specific solution partners targeting our computing platform
  • Working closely with customers to help them adopt and build creative solutions using NVIDIA technology and MLOps solutions
  • Analyzing performance and power efficiency of AI inference workloads on Kubernetes
  • Some travel to conferences and customers may be required

Preferred Qualifications

  • Prior experience with DL training at scale, deploying or optimizing DL inference in production
  • Experience with NVIDIA GPUs and software libraries such as NVIDIA NIM, Dynamo, TensorRT, TensorRT-LLM
  • Excellent C/C++ programming skills, including debugging, profiling, code optimization, performance analysis, and test design
  • Familiarity with parallel programming and distributed computing platforms