Skip to content

Infra Lead
Company | Speak |
---|
Location | San Francisco, CA, USA |
---|
Salary | $180000 – $260000 |
---|
Type | Full-Time |
---|
Degrees | |
---|
Experience Level | Senior, Expert or higher |
---|
Requirements
- 7+ years of experience in SRE, DevOps, or infrastructure-focused engineering roles, ideally with experience leading or mentoring others
- Strong experience with GCP, Kubernetes, Terraform, Node.js, Python, PostgreSQL, Redis, and observability tooling like Prometheus and Sentry
- Proven track record of improving reliability, scaling systems, and reducing incident frequency and severity with high traffic systems
- Strong incident management and root cause analysis skills—you know how to lead under pressure
- Experience building and maintaining CI/CD pipelines and deployment safety tooling
- Strong systems thinking, with the ability to identify failure points and proactively harden services
- Deep sense of ownership and a desire to make infrastructure a force multiplier for the rest of the org
Responsibilities
- Own the reliability of Speak’s infrastructure across GCP, Kubernetes, and our Node.js/Postgres stack
- Lead response for P0/P1 incidents, drive postmortems, and ensure we’re learning from every outage
- Improve observability, alerting, and on-call processes so we catch issues before users do
- Define and drive adoption of SLOs/SLAs for core systems and services
- Build tools and frameworks to make reliability easier for product engineers—think safer deploys and infrastructure automation
- Collaborate cross-functionally with Product, Engineering, and ML teams to ensure reliability is baked into everything we build
- Set short term and long term roadmaps to ensure stability for our growing userbase
- Be a thought leader and coach around SRE principles—blameless culture, operational maturity, and continuous improvement
Preferred Qualifications
- Familiarity with cost optimization strategies in cloud-native environments
- Background in security, chaos engineering, or disaster recovery planning
- Contributions to internal tooling, automation, or developer productivity initiatives