Senior Technical Program Manager - AI/ML & Data Infrastructure - Central Technology

Senior Technical Program Manager – AI/ML & Data Infrastructure – Central Technology

7+ years of experience in technical program management or infrastructure-focused operations in complex engineering environments.
Proven ability to manage large-scale technical programs across multiple stakeholders and teams.
High-level understanding of machine learning workflows and model training pipelines, with the ability to translate infrastructure needs between research and engineering teams.
Strong organizational skills and experience leading cross-functional programs with tight timelines and multiple stakeholders.
Excellent written and verbal communication skills, including the ability to align stakeholders at multiple levels.
A passion for building efficient, secure, and inclusive systems to support cutting-edge science and research.
Familiarity with on-prem/HPC and/or multi cloud-based GPU infrastructure, orchestration tools, and platforms like Slurm, Run:AI, MLflow, W&B or similar systems is a huge plus.

Lead AI/ML infrastructure programs: Drive execution of technical initiatives across GPU scheduling, platform enablement, observability, or workload orchestration.
Lead access and lifecycle workflows: Own the end-to-end experience for users accessing shared infrastructure resources—including onboarding, offboarding, documentation, and support processes.
Coordinate infrastructure access requests: Manage intake and operational workflows for machine learning infrastructure access, including triage, tracking, and communication.
Drive documentation systems: Own the structure, accuracy, and governance of internal documentation, onboarding guides, runbooks, and infrastructure wikis.
Enhance visibility: Maintain and improve AI system dashboards and reporting systems for onboarding timelines, RFA volume, and infrastructure program milestones.

Familiarity with on-prem/HPC and/or multi cloud-based GPU infrastructure, orchestration tools, and platforms like Slurm, Run:AI, MLflow, W&B or similar systems is a huge plus.