Senior GPU Supercomputer Scheduler Engineer
Company | NVIDIA |
---|---|
Location | Austin, TX, USA, Santa Clara, CA, USA, Durham, NC, USA, Westford, MA, USA |
Salary | $148000 – $287500 |
Type | Full-Time |
Degrees | Bachelor’s |
Experience Level | Senior |
Requirements
- Bachelor’s degree in Computer Science, Electrical Engineering or related field or equivalent experience
- 5+ years of work experience
- Strong understanding of HPC batch schedulers, such as Slurm or LSF and HPC workflows that use MPI
- Significant experience in Programming in C/C++ and advanced scripting in languages such as Python, Go, bash scripting
- Established experience in Linux operating system, environment and tools
- Accomplished in computer architecture and operating systems
- Experience analyzing and tuning performance for a variety of HPC workloads
- In-depth understanding of container technologies like Docker, Singularity, Podman
- Flexibility/adaptability for working in a dynamic environment with different frameworks and requirements
- Excellent communication, interpersonal and customer collaboration skills
Responsibilities
- Design and develop enhancements to the HPC batch scheduler(s)
- Work extensively with HPC scheduler vendor on bug fixes and feature releases
- Provide support to staff and end users to resolve batch scheduler issues
- Build and improve our ecosystem around GPU-accelerated computing
- Performance analysis and optimizations of deep learning workflows
- Develop large scale automation solutions
- Root cause analysis and suggest corrective action for problems large and small scales
- Finding and fixing problems before they occur
Preferred Qualifications
- Knowledge in MPI and High-performance computing
- Background in RDMA technology
- Open Source Software Contribution
- Experience with deep learning frameworks like PyTorch and TensorFlow
- Passionate about SW development processes