Lead Machine Learning Engineer – Performance and Scalability – Generative AI
Company | Adobe |
---|---|
Location | Seattle, WA, USA, San Jose, CA, USA, New York, NY, USA |
Salary | $162000 – $301200 |
Type | Full-Time |
Degrees | Master’s, PhD |
Experience Level | Senior, Expert or higher |
Requirements
- 8+ years of proven track record in building high-performance ML infrastructure and scalable AI systems.
- MS, or PHD in computer science or related field.
- Strong programming skills in Python and C++, with expertise in building ML pipelines and model deployment infrastructure.
- Experience deploying large-scale ML models in cloud environments, including AWS GPU instances, Kubernetes, Ray, or similar.
- Experience with model conversion and optimization frameworks like ONNX and TensorRT, as well as AOT compilation techniques.
- Experience with cloud-native architectures, autoscaling strategies, and fault-tolerant machine learning systems.
- Proficiency in GPU orchestration, CUDA, and accelerated inference techniques.
- Hands-on experience with profiling tools (e.g., Nsight, PyTorch Profiler, perf) for system performance analysis.
- Ability to work in a fast-paced, startup-like environment with multi-functional teams.
Responsibilities
- Architect and optimize ML pipelines to support scalable inference and model deployment on cloud-based GPU infrastructure (e.g., AWS P5 instances).
- Develop and maintain high-throughput serving pipelines for generative AI models, ensuring low-latency, high-performance execution.
- Enable model serving optimizations by designing systems that support tensor parallelism, quantization, distillation, and caching, in collaboration with ML research teams.
- Develop automated monitoring and profiling tools to track system efficiency, detect performance regressions, and optimize resource utilization.
- Optimize GPU resource allocation and orchestration across cloud-based ML workloads.
- Integrate scalable load testing frameworks to validate model inference performance under high-traffic conditions.
- Collaborate with infrastructure and applied ML teams to transition models from experimentation to production-ready, cloud-optimized deployments.
- Establish standard methodologies for scaling and cloud-native ML architectures, ensuring efficient deployment across multi-region cloud environments.
Preferred Qualifications
-
No preferred qualifications provided.