Posted in

LLM/ML Engineer – Inference

LLM/ML Engineer – Inference

CompanyReducto
LocationSan Francisco, CA, USA
Salary$200000 – $300000
TypeFull-Time
Degrees
Experience LevelSenior

Requirements

  • Deep expertise in Python and PyTorch
  • Strong foundation in low-level operating systems concepts including multi-threading, memory management, networking, storage, performance, and scale
  • Experience with modern inference systems like TGI, vLLM, TensorRT-LLM, and Optimum
  • Comfortable creating custom tooling for testing and optimization

Responsibilities

  • Architecting and implementing robust, scalable inference systems for serving state-of-the-art AI models
  • Optimizing model serving infrastructure for high throughput and low latency at scale
  • Developing and integrating advanced inference optimization techniques
  • Working closely with our research team to bring cutting-edge capabilities into production
  • Building developer tools and infrastructure to support rapid experimentation and deployment

Preferred Qualifications

  • Experience with low-level systems programming (CUDA, Triton) and compiler optimization
  • Passionate about open-source contributions and staying current with ML infrastructure developments
  • Practical experience with high-performance computing and distributed systems
  • Worked in early-stage environments where you helped shape technical direction
  • Energized by solving complex technical challenges in a collaborative environment