AI Software Engineer – Inference
Company | Nexus |
---|---|
Location | San Francisco, CA, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | |
Experience Level | Mid Level |
Requirements
- 3+ years of experience in software engineering, preferably with exposure to ML systems in production
- Strong skills in Python, Go, or Java, and a solid understanding of system performance fundamentals
- Experience with containerization (Docker, Kubernetes) and deploying services in the cloud (AWS, GCP, or Azure)
- Solid understanding of model serving architectures and techniques for optimizing latency and throughput
- Comfort with performance tuning and profiling of ML model execution
- A practical mindset and eagerness to own production systems from build to run
- Embrace AI as a core part of how you work, think, and build.
Responsibilities
- Design and optimize lightning-fast inference pipelines for both real-time and batch predictions
- Deploy and scale machine learning models in production across cloud and containerized environments
- Leverage frameworks like TensorFlow Serving, TorchServe, or Triton to serve models at scale
- Monitor performance in the wild — build tools to track model behavior, latency, and reliability
- Work with researchers to productionize models, implement model compression, and make inference as efficient as possible
- Solve problems fast — whether it’s a scaling bottleneck, a failed deployment, or a rogue latency spike
- Build internal tools that streamline how we deploy and monitor inference workloads
Preferred Qualifications
- Experience with hardware acceleration for inference (GPUs, TPUs, etc.)
- Familiarity with real-time data processing and streaming tools
- Hands-on with edge deployment (mobile, embedded, etc.)
- Contributions to open-source projects in model serving or ML infrastructure