Posted in

Staff Machine Learning Engineer

Staff Machine Learning Engineer

CompanyTempus
LocationSan Francisco, CA, USA, Remote in USA, Chicago, IL, USA, New York, NY, USA
Salary$170000 – $230000
TypeFull-Time
DegreesMaster’s, PhD
Experience LevelSenior, Expert or higher

Requirements

  • Master’s degree in Computer Science, Artificial Intelligence, Software Engineering, or a related field. A strong academic background with a focus on AI data engineering.
  • Proven track record (8+ years of industry experience) in designing, building, and operating large-scale data pipelines and infrastructure in a production environment.
  • Strong experience working with massive, heterogeneous datasets (TBs+) and modern distributed data processing tools and frameworks such as Apache Spark, Ray, or Dask.
  • Strong, hands-on experience with tools and libraries specifically designed for large-scale ML data handling, such as Hugging Face Datasets, MosaicML Streaming, or similar frameworks (e.g., WebDataset, Petastorm). Experience with MLOps tools and platforms (e.g., MLflow, Kubeflow, SageMaker Pipelines).
  • Understanding of the data challenges specific to training large models (Foundation Models, LLMs, Multimodal Models).
  • Proficiency in programming languages like Python and experience with modern distributed data processing tools and frameworks.
  • Proven ability to bring thought leadership to the product and engineering teams, influencing technical direction and data strategy.
  • Experience mentoring junior engineers and collaborating effectively with cross-functional teams (Research Scientists, ML Engineers, Platform Engineers, Product Managers, Clinicians).
  • Excellent communication skills, capable of explaining complex technical concepts to diverse audiences.
  • Strong bias-to-action and ability to thrive in a fast-paced, dynamic research and development environment.
  • A pragmatic approach focused on delivering rapid, iterative, and measurable progress towards impactful goals.

Responsibilities

  • Architect and build sophisticated data processing workflows responsible for ingesting, processing, and preparing multimodal training data that seamlessly integrate with large-scale distributed ML training frameworks and infrastructure (GPU clusters).
  • Develop strategies for efficient, compliant data ingestion from diverse sources, including internal databases, third-party APIs, public biomedical datasets, and Tempus’s proprietary data ecosystem.
  • Utilize, optimize, and contribute to frameworks specialized for large-scale ML data loading and streaming (e.g., MosaicML Streaming, Ray Data, HF Datasets).
  • Collaborate closely with infrastructure and platform teams to leverage and optimize cloud-native services (primarily GCP) for performance, cost-efficiency, and security.
  • Engineer efficient connectors and data loaders for accessing and processing information from diverse knowledge sources, such as knowledge graphs, internal structured databases, biomedical literature repositories (e.g., PubMed), and curated ontologies.
  • Optimize data storage for efficient large scale training training and knowledge access.
  • Orchestrate, monitor, and troubleshoot complex data workflows using tools like Airflow, Kubeflow Pipelines.
  • Establish robust monitoring, logging, and alerting systems for data pipeline health, data drift detection, and data quality assurance, providing feedback loops for continuous improvement.
  • Analyze and optimize data I/O performance bottlenecks considering storage systems, network bandwidth and compute resources.
  • Actively manage and seek optimizations for the costs associated with storing and processing massive datasets in the cloud.

Preferred Qualifications

  • Advanced degree (PhD) in Computer Science, Engineering, Bioinformatics, or a related field.
  • Contributions to relevant open-source projects.
  • Direct experience working with clinical or biological data (EHR, genomics, medical imaging).