Senior Engineer – ML Infrastructure
Company | CoreWeave |
---|---|
Location | Livingston, NJ, USA, New York, NY, USA, Bellevue, WA, USA, Sunnyvale, CA, USA |
Salary | $175000 – $210000 |
Type | Full-Time |
Degrees | |
Experience Level | Senior |
Requirements
- Six or more years of experience in a software engineering industry with a specialization in developing and troubleshooting distributed systems in production and at scale.
- Drive to learn and grow in a rapidly evolving technology space and interest or experience in core technologies supported by the team such as Slurm, KNative, and/or Istio.
- Comfortable with using Go as the primary programming language and capable of navigating a Linux operating environment.
- Experience using Kubernetes with an applicable understanding of its major components and ingress/service meshes.
- Thorough knowledge of cloud-native development including experience with container technologies, microservice design and architectural patterns.
- Familiarity with complex event processing and event-driven architecture.
- Ability to transform problems in elastic architectures, decompose them into achievable tasks, and socialize both to teammates.
- Interest in reliability engineering concepts such as different types of testing, progressive deployments, error budgets, the role of observability, and fault-tolerant design.
Responsibilities
- Identify and implement scalable and fault-tolerant interfaces for consuming GPU resources that are responsive to the needs and practices of the ML community.
- Create test plans, deployment automation, dashboards, alerts, and insights into our product’s operations as well as participate in the ML Infrastructure on-call rotation.
Preferred Qualifications
-
No preferred qualifications provided.