Posted in

Machine Learning Engineer

Machine Learning Engineer

CompanyCaptions
LocationNew York, NY, USA
Salary$170000 – $230000
TypeFull-Time
Degrees
Experience LevelMid Level, Senior

Requirements

  • Proven experience deploying deep learning models on GPU-based infrastructure (NVIDIA GPUs, CUDA, TensorRT, etc.)
  • Strong knowledge of containerization (Docker, Kubernetes) and microservice architectures for ML model serving.
  • Proficiency with Python and at least one deep learning framework (PyTorch, TensorFlow).
  • Familiarity with compression techniques (quantization, pruning, distillation) for large-scale models.
  • Experience profiling and optimizing model inference (batching, concurrency, hardware utilization).
  • Hands-on experience with ML pipeline orchestration (Airflow, Kubeflow, Argo) and automated CI/CD for ML.
  • Strong grasp of logging, monitoring, and alerting tools (Prometheus, Grafana, etc.) in distributed systems.
  • Exposure to diffusion models, multimodal video generation, or large-scale generative architectures.
  • Experience with distributed training frameworks (FSDP, DeepSpeed, Megatron-LM) or HPC environments.

Responsibilities

  • Develop high-performance GPU-based inference pipelines for large multimodal diffusion models.
  • Build, optimize, and maintain serving infrastructure to deliver low-latency predictions at large scale.
  • Collaborate with DevOps teams to containerize models, manage autoscaling, and ensure uptime SLAs.
  • Leverage techniques like quantization, pruning, and distillation to reduce latency and memory footprint without compromising quality.
  • Implement continuous fine-tuning workflows to adapt models based on real-world data and feedback.
  • Design and maintain automated CI/CD pipelines for model deployment, versioning, and rollback.
  • Implement robust monitoring (latency, throughput, concept drift) and alerting for critical production systems.
  • Explore cutting-edge GPU acceleration frameworks (e.g., TensorRT, Triton, TorchServe) to continuously improve throughput and reduce costs.

Preferred Qualifications

    No preferred qualifications provided.