Posted in

Senior Full-Stack Software Engineer

Senior Full-Stack Software Engineer

CompanyNVIDIA
LocationSeattle, WA, USA, Santa Clara, CA, USA
Salary$184000 – $356500
TypeFull-Time
DegreesBachelor’s, Master’s
Experience LevelSenior, Expert or higher

Requirements

  • 8+ years of experience in developing software infrastructure for large scale AI systems.
  • Bachelor’s degree or higher in Computer Science or a related technical field (or equivalent experience).
  • Proficiency with full-stack development: JavaScript (Vue or React), Node.js, Python, and/or Golang, script languages
  • Experience with distributed systems and cloud-native technologies (Docker, Kubernetes, microservices)
  • Familiarity with observability stacks: ELK, OpenSearch, Prometheus, Grafana, or Loki
  • Strong debugging and root cause analysis skills across application and infrastructure layers
  • Experience with large-scale AI training, inference, or data infrastructure services
  • Excellent communication, collaboration, problem solving and a growth mindset

Responsibilities

  • Design, develop, and deploy full-stack web applications to support large-scale AI infrastructure operations and workflows
  • Collaborate with AI and ML research teams to identify pain points and deliver tools that accelerate their work
  • Develop APIs, backend services, and UIs to improve visibility, observability, and control over large-scale GPU clusters
  • Develop backend services to manage job schedulers and cluster operations.
  • Define and track metrics that measure efficiency, resiliency, and developer productivity across the platform
  • Drive engineering excellence in testing, CI/CD, code quality, and performance
  • Lead architectural discussions and mentor junior engineers on design and implementation
  • Stay ahead of AI/ML infrastructure trends and drive adoption of best practices within the team

Preferred Qualifications

  • Experience building developer platforms or self-service internal infrastructure tools for efficiency, resiliency, or observability.
  • Hands-on experience as a Machine Learning Engineer (MLE) or deep familiarity with DL frameworks (e.g., PyTorch, TensorFlow, JAX, Ray).
  • Hands-on experience operating at datacenter scale, including GPU cluster debugging and root cause analysis.
  • Experience with MongoDB, Hadoop, or Spark.