Posted in

Software Engineer ML Infra Systems – Senior

Software Engineer ML Infra Systems – Senior

Companyd-Matrix
LocationSanta Clara, CA, USA
Salary$180000 – $280000
TypeFull-Time
DegreesBachelor’s, Master’s, PhD
Experience LevelSenior

Requirements

  • BS in Computer Science, Engineering, Math, Physics, or related degree with 4+ years of industry software development experience and MS in Computer Science, Engineering, Math, Physics, or related degree preferred with 2+ years
  • Strong grasp of system software, data structures, computer architecture, and machine learning fundamentals
  • Proficient in C/C++/Python development in Linux environment and using standard development tools
  • Experience with distributed, high-performance software design and implementation
  • Self-motivated team player with a strong sense of ownership and leadership.

Responsibilities

  • Be part of the team that helps productize the SW stack for the AI compute engine
  • Responsible for the development, enhancement, and maintenance of the next-generation AI deployment software
  • Build and scale software deliverables in a tight development window
  • Work with a team of system software experts to build out the deployment infrastructure
  • Work closely with other software (ML, compilers) and hardware experts in the company.

Preferred Qualifications

  • MS or PhD in Computer Science, Electrical Engineering, or related fields
  • Experience with inference servers/model serving frameworks (such as TensorRT-LLM, vLLM, SGLang, etc.)
  • Experience with deep learning frameworks (such as PyTorch and TensorFlow)
  • Experience with deep learning runtimes (such as ONNX Runtime, TensorRT, etc.)
  • Experience with distributed systems collectives such as NCCL, OpenMPI
  • Experience with software testing fundamentals
  • Experience deploying ML workloads (LLMs, VLMs, NLP, etc.) on distributed systems
  • Experience with Kubernetes, Ray, or other MLOps tools and techniques used from definition to deployment
  • Prior startup, small team, or incubation experience
  • Work experience at a cloud provider or AI compute/subsystem company