Posted in

AI Infrastructure Engineer – Model Serving Platform

AI Infrastructure Engineer – Model Serving Platform

CompanyScale AI
LocationSan Francisco, CA, USA, New York, NY, USA
Salary$175000 – $220000
TypeFull-Time
Degrees
Experience LevelMid Level, Senior

Requirements

  • 4+ years of experience building large-scale, high-performance backend systems.
  • Strong programming skills in one or more languages (e.g., Python, Go, Rust, C++).
  • Deep understanding of concurrency, memory management, networking, and distributed systems.
  • Experience with containers, virtualization, and orchestration tools (e.g., Docker, Kubernetes).
  • Familiarity with cloud infrastructure (AWS, GCP) and infrastructure as code (e.g., Terraform).
  • Proven ability to solve complex problems and work independently in fast-moving environments.

Responsibilities

  • Build and maintain fault-tolerant, high-performance systems for serving LLMs and agent-based workloads at scale.
  • Collaborate with researchers and engineers to integrate and optimize models for production and research use cases.
  • Conduct architecture and design reviews to uphold best practices in system design and scalability.
  • Develop monitoring and observability solutions to ensure system health and performance.
  • Lead projects end-to-end, from requirements gathering to implementation, in a cross-functional environment.

Preferred Qualifications

  • Experience with modern LLM serving frameworks such as vLLM, SGLang, TensorRT-LLM, or text-generation-inference.
  • Knowledge of ML frameworks (e.g., PyTorch or TensorFlow) and how to optimize them for production serving.
  • Experience with model inference optimizations such as quantization, distillation, speculative decoding, etc.
  • Familiarity with emerging agent frameworks such as OpenHands, Agent2Agent, MCP.