Posted in

Senior Engineer – ML Infrastructure

Senior Engineer – ML Infrastructure

CompanyCoreWeave
LocationLivingston, NJ, USA, New York, NY, USA, Bellevue, WA, USA, Sunnyvale, CA, USA
Salary$175000 – $210000
TypeFull-Time
Degrees
Experience LevelSenior

Requirements

  • Six or more years of experience in a software engineering industry with a specialization in developing and troubleshooting distributed systems in production and at scale.
  • Drive to learn and grow in a rapidly evolving technology space and interest or experience in core technologies supported by the team such as Slurm, KNative, and/or Istio.
  • Comfortable with using Go as the primary programming language and capable of navigating a Linux operating environment.
  • Experience using Kubernetes with an applicable understanding of its major components and ingress/service meshes.
  • Thorough knowledge of cloud-native development including experience with container technologies, microservice design and architectural patterns.
  • Familiarity with complex event processing and event-driven architecture.
  • Ability to transform problems in elastic architectures, decompose them into achievable tasks, and socialize both to teammates.
  • Interest in reliability engineering concepts such as different types of testing, progressive deployments, error budgets, the role of observability, and fault-tolerant design.

Responsibilities

  • Identify and implement scalable and fault-tolerant interfaces for consuming GPU resources that are responsive to the needs and practices of the ML community.
  • Create test plans, deployment automation, dashboards, alerts, and insights into our product’s operations as well as participate in the ML Infrastructure on-call rotation.

Preferred Qualifications

    No preferred qualifications provided.