LLM/ML Engineer - Inference

LLM/ML Engineer – Inference

Deep expertise in Python and PyTorch
Strong foundation in low-level operating systems concepts including multi-threading, memory management, networking, storage, performance, and scale
Experience with modern inference systems like TGI, vLLM, TensorRT-LLM, and Optimum
Comfortable creating custom tooling for testing and optimization

Architecting and implementing robust, scalable inference systems for serving state-of-the-art AI models
Optimizing model serving infrastructure for high throughput and low latency at scale
Developing and integrating advanced inference optimization techniques
Working closely with our research team to bring cutting-edge capabilities into production
Building developer tools and infrastructure to support rapid experimentation and deployment

Experience with low-level systems programming (CUDA, Triton) and compiler optimization
Passionate about open-source contributions and staying current with ML infrastructure developments
Practical experience with high-performance computing and distributed systems
Worked in early-stage environments where you helped shape technical direction
Energized by solving complex technical challenges in a collaborative environment