Skip to content

LLM/ML Engineer – Inference
Company | Reducto |
---|
Location | San Francisco, CA, USA |
---|
Salary | $200000 – $300000 |
---|
Type | Full-Time |
---|
Degrees | |
---|
Experience Level | Senior |
---|
Requirements
- Deep expertise in Python and PyTorch
- Strong foundation in low-level operating systems concepts including multi-threading, memory management, networking, storage, performance, and scale
- Experience with modern inference systems like TGI, vLLM, TensorRT-LLM, and Optimum
- Comfortable creating custom tooling for testing and optimization
Responsibilities
- Architecting and implementing robust, scalable inference systems for serving state-of-the-art AI models
- Optimizing model serving infrastructure for high throughput and low latency at scale
- Developing and integrating advanced inference optimization techniques
- Working closely with our research team to bring cutting-edge capabilities into production
- Building developer tools and infrastructure to support rapid experimentation and deployment
Preferred Qualifications
- Experience with low-level systems programming (CUDA, Triton) and compiler optimization
- Passionate about open-source contributions and staying current with ML infrastructure developments
- Practical experience with high-performance computing and distributed systems
- Worked in early-stage environments where you helped shape technical direction
- Energized by solving complex technical challenges in a collaborative environment