AI Software Engineer - Inference

AI Software Engineer – Inference

3+ years of experience in software engineering, preferably with exposure to ML systems in production
Strong skills in Python, Go, or Java, and a solid understanding of system performance fundamentals
Experience with containerization (Docker, Kubernetes) and deploying services in the cloud (AWS, GCP, or Azure)
Solid understanding of model serving architectures and techniques for optimizing latency and throughput
Comfort with performance tuning and profiling of ML model execution
A practical mindset and eagerness to own production systems from build to run
Embrace AI as a core part of how you work, think, and build.

Design and optimize lightning-fast inference pipelines for both real-time and batch predictions
Deploy and scale machine learning models in production across cloud and containerized environments
Leverage frameworks like TensorFlow Serving, TorchServe, or Triton to serve models at scale
Monitor performance in the wild — build tools to track model behavior, latency, and reliability
Work with researchers to productionize models, implement model compression, and make inference as efficient as possible
Solve problems fast — whether it’s a scaling bottleneck, a failed deployment, or a rogue latency spike
Build internal tools that streamline how we deploy and monitor inference workloads