Posted in

Director of Production Engineering

Director of Production Engineering

CompanyCoreWeave
LocationLivingston, NJ, USA, New York, NY, USA, Bellevue, WA, USA, Sunnyvale, CA, USA
Salary$230000 – $275000
TypeFull-Time
DegreesBachelor’s
Experience LevelExpert or higher

Requirements

  • Bachelor’s degrees in Computer Science, Engineering, or related fields.
  • 10+ years of engineering leadership roles within SRE, DevOps, or cloud infrastructure.
  • 5+ years in managing large-scale infrastructure-as-service in a geographically distributed, always-on environment.
  • Proven success leading 24×7 operations teams and delivering high-availability services at scale.
  • Deep expertise in automation, monitoring/observabilities, and incident response frameworks.
  • Familiarity with AI purpose-built cloud-native architectures, CI/CD systems, and performance tuning.

Responsibilities

  • Define and execute the SRE vision, strategy, and roadmap for a large-scale, distributed cloud infrastructure.
  • Lead and mentor a high-performing team of SREs, promoting a culture of ownership, collaboration, and continuous learning.
  • Champion automation-first practices, leveraging tools like Terraform, Kubernetes, and Infrastructure-as-Code to minimize toil and manual interventions.
  • Establish and evolve best practices in observability, monitoring, and alerting, ensuring the platform is proactive, not reactive.
  • Drive initiatives for incident management, postmortem culture, root cause analysis, and system hardening.
  • Collaborate with engineering, product, and customer support teams to build scalable, resilient, and self-healing systems.
  • Evolve our on-call strategy and processes to support a 24×7, globally distributed platform with minimal disruptions.

Preferred Qualifications

  • Hands-on experience with Python, Go, Java, or Ruby for operational tooling and automation.
  • Strong track record of hiring, mentoring, and developing top-tier SRE talent in high-growth companies.
  • Comfortable navigating cross-functional dynamics and influencing leadership across engineering, product, and support.
  • Experience leading DevOps and reliability transformation projects, improving developer velocity and platform resilience.