Director of Engineering - Compute

Director of Engineering – Compute

10+ years in large-scale infrastructure engineering, including 3+ years leading teams that run business-critical, globally distributed fleets.
Proven leadership experience in highly technical engineering environments, with a track record of delivering innovative platform solutions and effectively leading, motivating, and developing high-performing engineering teams.
Demonstrated excellence in communication, planning, negotiation, and interpersonal interactions across executives, cross-functional stakeholders, and team members, with a strong ability to influence and drive organizational change.
Cloud & hybrid experience: History of building, deploying, and operating compute in data centers in addition to augmenting with Cloud-based workloads – ideally GCP.
Hands-on lineage: you once built or operated clusters yourself (writing Golang operators, CRDs, and CLI tools), graduated into org leadership, and still dive deep when needed.
Experience with on-prem storage technologies and their Kubernetes integrations.
CI/CD leadership: design and run pipelines (GitHub Actions, Buildkite, or similar) that build, test, sign, and promote container images at hyperscale velocity.

Architect the development of Groq’s hyperscale compute platform, ensuring scalability, reliability, and security.
Plan: define and execute technical roadmaps that advance Groq’s capability to manage large-scale general and specialized compute infrastructure efficiently.
Lead highly technical engineering teams focused on container orchestration, hardware provisioning, and platform automation.
Build and grow the organization: attract, hire, mentor, and retain top-tier engineers; shape a culture of automation, simplicity, rapid learning and operational excellence.
Operate the fleet: own production Kubernetes clusters and Storage solutions distributed across several geographic regions, driving SLOs, incident response, and continual improvement.
Ship continuously: enforce robust CI/CD—with container image scanning, automated integration tests, and progressive roll-outs—to keep the platform secure and rapidly evolving.
Collaborate globally with data-center, hardware, and hardware teams to ensure seamless capacity expansions, hardware refreshes, and energy-efficiency initiatives.
Advanced low-latency networking: partner closely with our networking to ensure we champion modern data-plane technologies (Cilium/eBPF, BGP-based service routing, advanced load balancing) for low-latency throughput and high security.

Humility – Egos are checked at the door
Collaborative & Team Savvy – We make up the smartest person in the room, together
Growth & Giver Mindset – Learn it all versus know it all, we share knowledge generously
Curious & Innovative – Take a creative approach to projects, problems, and design
Passion, Grit, & Boldness – no limit thinking, fueling informed risk taking