Posted in

Lead Machine Learning Engineer – Performance and Scalability – Generative AI

Lead Machine Learning Engineer – Performance and Scalability – Generative AI

CompanyAdobe
LocationSeattle, WA, USA, San Jose, CA, USA, New York, NY, USA
Salary$162000 – $301200
TypeFull-Time
DegreesMaster’s, PhD
Experience LevelSenior, Expert or higher

Requirements

  • 8+ years of proven track record in building high-performance ML infrastructure and scalable AI systems.
  • MS, or PHD in computer science or related field.
  • Strong programming skills in Python and C++, with expertise in building ML pipelines and model deployment infrastructure.
  • Experience deploying large-scale ML models in cloud environments, including AWS GPU instances, Kubernetes, Ray, or similar.
  • Experience with model conversion and optimization frameworks like ONNX and TensorRT, as well as AOT compilation techniques.
  • Experience with cloud-native architectures, autoscaling strategies, and fault-tolerant machine learning systems.
  • Proficiency in GPU orchestration, CUDA, and accelerated inference techniques.
  • Hands-on experience with profiling tools (e.g., Nsight, PyTorch Profiler, perf) for system performance analysis.
  • Ability to work in a fast-paced, startup-like environment with multi-functional teams.

Responsibilities

  • Architect and optimize ML pipelines to support scalable inference and model deployment on cloud-based GPU infrastructure (e.g., AWS P5 instances).
  • Develop and maintain high-throughput serving pipelines for generative AI models, ensuring low-latency, high-performance execution.
  • Enable model serving optimizations by designing systems that support tensor parallelism, quantization, distillation, and caching, in collaboration with ML research teams.
  • Develop automated monitoring and profiling tools to track system efficiency, detect performance regressions, and optimize resource utilization.
  • Optimize GPU resource allocation and orchestration across cloud-based ML workloads.
  • Integrate scalable load testing frameworks to validate model inference performance under high-traffic conditions.
  • Collaborate with infrastructure and applied ML teams to transition models from experimentation to production-ready, cloud-optimized deployments.
  • Establish standard methodologies for scaling and cloud-native ML architectures, ensuring efficient deployment across multi-region cloud environments.

Preferred Qualifications

    No preferred qualifications provided.