AI Infrastructure Engineer – Model Serving Platform
Company | Scale AI |
---|---|
Location | San Francisco, CA, USA, New York, NY, USA |
Salary | $175000 – $220000 |
Type | Full-Time |
Degrees | |
Experience Level | Mid Level, Senior |
Requirements
- 4+ years of experience building large-scale, high-performance backend systems.
- Strong programming skills in one or more languages (e.g., Python, Go, Rust, C++).
- Deep understanding of concurrency, memory management, networking, and distributed systems.
- Experience with containers, virtualization, and orchestration tools (e.g., Docker, Kubernetes).
- Familiarity with cloud infrastructure (AWS, GCP) and infrastructure as code (e.g., Terraform).
- Proven ability to solve complex problems and work independently in fast-moving environments.
Responsibilities
- Build and maintain fault-tolerant, high-performance systems for serving LLMs and agent-based workloads at scale.
- Collaborate with researchers and engineers to integrate and optimize models for production and research use cases.
- Conduct architecture and design reviews to uphold best practices in system design and scalability.
- Develop monitoring and observability solutions to ensure system health and performance.
- Lead projects end-to-end, from requirements gathering to implementation, in a cross-functional environment.
Preferred Qualifications
- Experience with modern LLM serving frameworks such as vLLM, SGLang, TensorRT-LLM, or text-generation-inference.
- Knowledge of ML frameworks (e.g., PyTorch or TensorFlow) and how to optimize them for production serving.
- Experience with model inference optimizations such as quantization, distillation, speculative decoding, etc.
- Familiarity with emerging agent frameworks such as OpenHands, Agent2Agent, MCP.