Posted in

Staff Cloud Availability Platform Engineer

Staff Cloud Availability Platform Engineer

CompanyCrusoe
LocationSan Francisco, CA, USA
Salary$180000 – $210000
TypeFull-Time
Degrees
Experience LevelSenior, Expert or higher

Requirements

  • 7+ years of experience in platform, backend, or infrastructure engineering roles.
  • Deep hands-on experience with Kubernetes internals, deployment patterns, and operational tooling.
  • Strong understanding of networking in containerized environments, including DNS, load balancing, and traffic routing.
  • Practical experience implementing and supporting event-driven systems at scale.
  • A proven ability to build and evolve API-driven infrastructure used by developers and systems alike.
  • Familiarity with observability tooling like Prometheus, Grafana, OpenTelemetry, and structured logging practices.
  • Working knowledge of infrastructure-as-code tools (Terraform, Helm) and CI/CD pipelines.

Responsibilities

  • Designing, deploying, and operating Kubernetes infrastructure for multi-tenant, distributed applications.
  • Implementing and optimizing the container and host networking stack, including CNI plugins, network policies, and service mesh integrations.
  • Building and evolving event-driven platforms using Kafka, NATS, or cloud-native pub/sub systems.
  • Developing and maintaining API interfaces (REST/gRPC) to power internal infrastructure services and developer tooling.
  • Driving improvements in system reliability, scalability, and observability through automation, instrumentation, and best practices.
  • Collaborating closely with SREs, security engineers, and backend developers to deliver infrastructure with strong operational maturity.
  • Contributing to and maintaining infrastructure-as-code and CI/CD pipelines for consistent, automated deployments.

Preferred Qualifications

  • Experience contributing to open-source infrastructure projects.
  • Familiarity with multi-cloud environments and hybrid cloud patterns.
  • Exposure to zero-downtime deployment strategies and advanced rollback mechanisms.