AI Infrastructure Engineer - Model Serving Platform

AI Infrastructure Engineer – Model Serving Platform

Company	Scale AI
Location	San Francisco, CA, USA, New York, NY, USA
Salary	$175000 – $220000
Type	Full-Time
Degrees
Experience Level	Mid Level, Senior

4+ years of experience building large-scale, high-performance backend systems.
Strong programming skills in one or more languages (e.g., Python, Go, Rust, C++).
Deep understanding of concurrency, memory management, networking, and distributed systems.
Experience with containers, virtualization, and orchestration tools (e.g., Docker, Kubernetes).
Familiarity with cloud infrastructure (AWS, GCP) and infrastructure as code (e.g., Terraform).
Proven ability to solve complex problems and work independently in fast-moving environments.

Build and maintain fault-tolerant, high-performance systems for serving LLMs and agent-based workloads at scale.
Collaborate with researchers and engineers to integrate and optimize models for production and research use cases.
Conduct architecture and design reviews to uphold best practices in system design and scalability.
Develop monitoring and observability solutions to ensure system health and performance.
Lead projects end-to-end, from requirements gathering to implementation, in a cross-functional environment.

Experience with modern LLM serving frameworks such as vLLM, SGLang, TensorRT-LLM, or text-generation-inference.
Knowledge of ML frameworks (e.g., PyTorch or TensorFlow) and how to optimize them for production serving.
Experience with model inference optimizations such as quantization, distillation, speculative decoding, etc.
Familiarity with emerging agent frameworks such as OpenHands, Agent2Agent, MCP.