Staff Machine Learning Engineer

Company	Tempus
Location	San Francisco, CA, USA, Remote in USA, Chicago, IL, USA, New York, NY, USA
Salary	$170000 – $230000
Type	Full-Time
Degrees	Master’s, PhD
Experience Level	Senior, Expert or higher

Requirements

Master’s degree in Computer Science, Artificial Intelligence, Software Engineering, or a related field. A strong academic background with a focus on AI data engineering.
Proven track record (8+ years of industry experience) in designing, building, and operating large-scale data pipelines and infrastructure in a production environment.
Strong experience working with massive, heterogeneous datasets (TBs+) and modern distributed data processing tools and frameworks such as Apache Spark, Ray, or Dask.
Strong, hands-on experience with tools and libraries specifically designed for large-scale ML data handling, such as Hugging Face Datasets, MosaicML Streaming, or similar frameworks (e.g., WebDataset, Petastorm). Experience with MLOps tools and platforms (e.g., MLflow, Kubeflow, SageMaker Pipelines).
Understanding of the data challenges specific to training large models (Foundation Models, LLMs, Multimodal Models).
Proficiency in programming languages like Python and experience with modern distributed data processing tools and frameworks.
Proven ability to bring thought leadership to the product and engineering teams, influencing technical direction and data strategy.
Experience mentoring junior engineers and collaborating effectively with cross-functional teams (Research Scientists, ML Engineers, Platform Engineers, Product Managers, Clinicians).
Excellent communication skills, capable of explaining complex technical concepts to diverse audiences.
Strong bias-to-action and ability to thrive in a fast-paced, dynamic research and development environment.
A pragmatic approach focused on delivering rapid, iterative, and measurable progress towards impactful goals.

Responsibilities

Architect and build sophisticated data processing workflows responsible for ingesting, processing, and preparing multimodal training data that seamlessly integrate with large-scale distributed ML training frameworks and infrastructure (GPU clusters).
Develop strategies for efficient, compliant data ingestion from diverse sources, including internal databases, third-party APIs, public biomedical datasets, and Tempus’s proprietary data ecosystem.
Utilize, optimize, and contribute to frameworks specialized for large-scale ML data loading and streaming (e.g., MosaicML Streaming, Ray Data, HF Datasets).
Collaborate closely with infrastructure and platform teams to leverage and optimize cloud-native services (primarily GCP) for performance, cost-efficiency, and security.
Engineer efficient connectors and data loaders for accessing and processing information from diverse knowledge sources, such as knowledge graphs, internal structured databases, biomedical literature repositories (e.g., PubMed), and curated ontologies.
Optimize data storage for efficient large scale training training and knowledge access.
Orchestrate, monitor, and troubleshoot complex data workflows using tools like Airflow, Kubeflow Pipelines.
Establish robust monitoring, logging, and alerting systems for data pipeline health, data drift detection, and data quality assurance, providing feedback loops for continuous improvement.
Analyze and optimize data I/O performance bottlenecks considering storage systems, network bandwidth and compute resources.
Actively manage and seek optimizations for the costs associated with storing and processing massive datasets in the cloud.

Preferred Qualifications

Advanced degree (PhD) in Computer Science, Engineering, Bioinformatics, or a related field.
Contributions to relevant open-source projects.
Direct experience working with clinical or biological data (EHR, genomics, medical imaging).