Posted in

AI Software Engineer – Inference

AI Software Engineer – Inference

CompanyNexus
LocationSan Francisco, CA, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
Degrees
Experience LevelMid Level

Requirements

  • 3+ years of experience in software engineering, preferably with exposure to ML systems in production
  • Strong skills in Python, Go, or Java, and a solid understanding of system performance fundamentals
  • Experience with containerization (Docker, Kubernetes) and deploying services in the cloud (AWS, GCP, or Azure)
  • Solid understanding of model serving architectures and techniques for optimizing latency and throughput
  • Comfort with performance tuning and profiling of ML model execution
  • A practical mindset and eagerness to own production systems from build to run
  • Embrace AI as a core part of how you work, think, and build.

Responsibilities

  • Design and optimize lightning-fast inference pipelines for both real-time and batch predictions
  • Deploy and scale machine learning models in production across cloud and containerized environments
  • Leverage frameworks like TensorFlow Serving, TorchServe, or Triton to serve models at scale
  • Monitor performance in the wild — build tools to track model behavior, latency, and reliability
  • Work with researchers to productionize models, implement model compression, and make inference as efficient as possible
  • Solve problems fast — whether it’s a scaling bottleneck, a failed deployment, or a rogue latency spike
  • Build internal tools that streamline how we deploy and monitor inference workloads

Preferred Qualifications

  • Experience with hardware acceleration for inference (GPUs, TPUs, etc.)
  • Familiarity with real-time data processing and streaming tools
  • Hands-on with edge deployment (mobile, embedded, etc.)
  • Contributions to open-source projects in model serving or ML infrastructure