Posted in

Infra Lead

Infra Lead

CompanySpeak
LocationSan Francisco, CA, USA
Salary$180000 – $260000
TypeFull-Time
Degrees
Experience LevelSenior, Expert or higher

Requirements

  • 7+ years of experience in SRE, DevOps, or infrastructure-focused engineering roles, ideally with experience leading or mentoring others
  • Strong experience with GCP, Kubernetes, Terraform, Node.js, Python, PostgreSQL, Redis, and observability tooling like Prometheus and Sentry
  • Proven track record of improving reliability, scaling systems, and reducing incident frequency and severity with high traffic systems
  • Strong incident management and root cause analysis skills—you know how to lead under pressure
  • Experience building and maintaining CI/CD pipelines and deployment safety tooling
  • Strong systems thinking, with the ability to identify failure points and proactively harden services
  • Deep sense of ownership and a desire to make infrastructure a force multiplier for the rest of the org

Responsibilities

  • Own the reliability of Speak’s infrastructure across GCP, Kubernetes, and our Node.js/Postgres stack
  • Lead response for P0/P1 incidents, drive postmortems, and ensure we’re learning from every outage
  • Improve observability, alerting, and on-call processes so we catch issues before users do
  • Define and drive adoption of SLOs/SLAs for core systems and services
  • Build tools and frameworks to make reliability easier for product engineers—think safer deploys and infrastructure automation
  • Collaborate cross-functionally with Product, Engineering, and ML teams to ensure reliability is baked into everything we build
  • Set short term and long term roadmaps to ensure stability for our growing userbase
  • Be a thought leader and coach around SRE principles—blameless culture, operational maturity, and continuous improvement

Preferred Qualifications

  • Familiarity with cost optimization strategies in cloud-native environments
  • Background in security, chaos engineering, or disaster recovery planning
  • Contributions to internal tooling, automation, or developer productivity initiatives