Systems Engineer – Research & Development
Company | Hudson River Trading |
---|---|
Location | Seattle, WA, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | |
Experience Level | Senior |
Requirements
- 5+ years of experience in large-scale Linux systems engineering in HPC, AI or distributed infrastructure roles
- Extensive experience in Linux system installation, performance tuning, and troubleshooting
- Expertise in troubleshooting distributed GPU workloads
- Deep knowledge around GPU optimization and performance
- Proficiency in Python scripting and automation frameworks
- Familiarity with configuration management tools (e.g. Salt, Ansible, Puppet, Chef)
- Comfortable diagnosing complex system issues at the hardware, OS, and network levels
- Strong communication and organizational skills; able to collaborate across diverse technical teams
- Thrive in fast-paced environments and excited by high-impact work
Responsibilities
- Design, build, and optimize large-scale Linux-based distributed compute clusters, including cutting-edge many-GPU systems
- Identify and resolve performance bottlenecks across compute, storage, and networking layers
- Collaborate with research and development teams to profile, benchmark, and fine-tune GPU-based workloads
- Automate system deployment, monitoring, and troubleshooting across thousands of nodes
- Collaborate with research, and engineering teams to support evolving workloads
- Own critical infrastructure projects—from concept to implementation and support
- Test and deploy new hardware and software, and partner with vendors to resolve complex issues
Preferred Qualifications
- CUDA or C/C++ experience a plus