Skip to content

Staff Platform Engineer – Kubernetes
Company | Amplitude |
---|
Location | San Francisco, CA, USA |
---|
Salary | $185000 – $319000 |
---|
Type | Full-Time |
---|
Degrees | |
---|
Experience Level | Senior, Expert or higher |
---|
Requirements
- 8+ years of experience in some combination of cloud-native software development, platform engineering, site reliability engineering, and/or cloud infrastructure, with a more recent focus on Kubernetes and the cloud-native ecosystem.
- Strong expertise in Kubernetes and related CNCF projects (e.g., Argo CD/Workflows, Backstage, Envoy, CoreDNS, and more) and in simplifying complex cloud infrastructure for broader teams.
- Operational experience at scale with technologies like Kafka and Airflow.
- Proficient in common infrastructure languages like Golang, Python, and Terraform, with experience developing and operating production systems.
- Extensive experience with AWS cloud infrastructure, networking, and security.
- Proven experience with monitoring and observability tools (Datadog, Splunk, Prometheus, Grafana Cloud, etc.) and a strong understanding of system performance tuning.
- Expertise in building abstractions over Kubernetes to simplify developer interaction with the platform.
- Excellent communication skills, with the ability to collaborate across teams, build consensus, and drive initiatives in a high-pressure environment.
- High level of empathy and patience, with a commitment to mentoring and helping others succeed, and the ability to incorporate feedback and turn it into actionable improvements.
- Experience with infrastructure-as-code and automation (Terraform, Helm, Kustomize, etc.), with a focus on reducing toil and operational overhead.
- A mindset focused on improving the developer experience and business alignment, with the flexibility to make decisions that may go against ideal technical preferences when necessary.
Responsibilities
- Lead the design, implementation, and management of our Kubernetes-based platform, focusing on scalability, developer experience, and system reliability.
- Architect and maintain automation around Kubernetes, ensuring that the platform is easy for developers to use and requires minimal toil to deploy or modify workloads in a self-service model.
- Collaborate with cross-functional teams (developers, leaders, and other infrastructure teams) to gather requirements, build consensus, and deliver impactful solutions.
- Integrate observability into the platform, using tools like Datadog, Prometheus, Grafana, New Relic, and Splunk to monitor system health and performance.
- Drive infrastructure-as-code initiatives using tools like Kubernetes Operators, Helm, Kustomize, and Terraform promoting automation, repeatability, and reliability.
- Ensure that the platform integrates seamlessly with CI/CD pipelines (using Argo CD / Workflows / Rollouts, Github Actions, Jenkins, or similar) and continuously improve developer workflows.
- Contribute to the operational excellence of the platform, including on-call responsibilities and incident management, while building self-healing capabilities where possible.
- Act as a mentor to other engineers on the team, promoting growth and knowledge sharing, ensuring that the team thrives even in the absence of specific individuals.
- Foster a culture of collaboration, empathy, and trust within the team and across departments, helping to bridge gaps between engineering and other business functions.
- Take a hands-on approach to problem-solving, sometimes submitting PRs to resolve issues in codebases or providing detailed solutions when teams need assistance.
Preferred Qualifications
- A mindset focused on improving the developer experience and business alignment, with the flexibility to make decisions that may go against ideal technical preferences when necessary.