Lead Machine Learning Engineer - Performance and Scalability - Generative AI

Lead Machine Learning Engineer – Performance and Scalability – Generative AI

Company	Adobe
Location	Seattle, WA, USA, San Jose, CA, USA, New York, NY, USA
Salary	$162000 – $301200
Type	Full-Time
Degrees	Master’s, PhD
Experience Level	Senior, Expert or higher

8+ years of proven track record in building high-performance ML infrastructure and scalable AI systems.
MS, or PHD in computer science or related field.
Strong programming skills in Python and C++, with expertise in building ML pipelines and model deployment infrastructure.
Experience deploying large-scale ML models in cloud environments, including AWS GPU instances, Kubernetes, Ray, or similar.
Experience with model conversion and optimization frameworks like ONNX and TensorRT, as well as AOT compilation techniques.
Experience with cloud-native architectures, autoscaling strategies, and fault-tolerant machine learning systems.
Proficiency in GPU orchestration, CUDA, and accelerated inference techniques.
Hands-on experience with profiling tools (e.g., Nsight, PyTorch Profiler, perf) for system performance analysis.
Ability to work in a fast-paced, startup-like environment with multi-functional teams.

Architect and optimize ML pipelines to support scalable inference and model deployment on cloud-based GPU infrastructure (e.g., AWS P5 instances).
Develop and maintain high-throughput serving pipelines for generative AI models, ensuring low-latency, high-performance execution.
Enable model serving optimizations by designing systems that support tensor parallelism, quantization, distillation, and caching, in collaboration with ML research teams.
Develop automated monitoring and profiling tools to track system efficiency, detect performance regressions, and optimize resource utilization.
Optimize GPU resource allocation and orchestration across cloud-based ML workloads.
Integrate scalable load testing frameworks to validate model inference performance under high-traffic conditions.
Collaborate with infrastructure and applied ML teams to transition models from experimentation to production-ready, cloud-optimized deployments.
Establish standard methodologies for scaling and cloud-native ML architectures, ensuring efficient deployment across multi-region cloud environments.

No preferred qualifications provided.