Skip to content

Incident Manager
Company | Crusoe |
---|
Location | San Francisco, CA, USA |
---|
Salary | $140000 – $165000 |
---|
Type | Full-Time |
---|
Degrees | |
---|
Experience Level | Mid Level, Senior |
---|
Requirements
- Strong technical experience with Linux, Virtualization, Kubernetes, and handling customer incidents.
- Solid understanding of the TCP/IP stack.
- Understanding of Infrastructure-as-Code (IaC) practices.
- Excellent communication skills, both written and verbal.
- Proven problem-solving mindset with the ability to diagnose and resolve complex technical issues.
- 3-5+ years’ experience in a team leadership role while acting as a liaison with external/internal customers.
- 4-5 years of customer facing experience.
Responsibilities
- Diagnose and resolve complex technical issues related to Infiniband, containerization, and distributed training, ensuring minimal disruption to customer operations.
- Guide and assist customers in implementing and optimizing their HPC infrastructure to achieve maximum performance and efficiency.
- Develop and deliver training materials, including internal training sessions, documentation, and knowledge base articles, to empower customers to effectively utilize our solutions.
- Work closely with internal engineering and product teams to provide valuable customer feedback and contribute to the improvement of product quality and the overall customer experience.
Preferred Qualifications
- Programming skills with one or more programming languages.