Skip to content

Software Engineer – Inference – GPU Enablement
Company | OpenAI |
---|
Location | San Francisco, CA, USA |
---|
Salary | $310000 – $460000 |
---|
Type | Full-Time |
---|
Degrees | |
---|
Experience Level | Mid Level, Senior |
---|
Requirements
- Have experience writing or porting GPU kernels using HIP, CUDA, or Triton, and care deeply about low-level performance.
- Are familiar with communication libraries like NCCL/RCCL and understand their role in high-throughput model serving.
- Have worked on distributed inference systems and are comfortable scaling models across fleets of accelerators.
- Enjoy solving end-to-end performance challenges across hardware, system libraries, and orchestration layers.
- Are excited to be part of a small, fast-moving team building new infrastructure from first principles.
Responsibilities
- Design and optimize high-performance GPU kernels for AMD accelerators using HIP, Triton, or other performance-focused frameworks.
- Build and tune collective communication libraries (e.g., RCCL) used to parallelize model execution across many GPUs.
- Integrate internal model-serving infrastructure (e.g., vLLM, Triton) into AMD-backed systems.
- Debug and optimize distributed inference workloads across memory, network, and compute layers.
- Validate correctness, performance, and scalability of model execution on large AMD GPU clusters.
Preferred Qualifications
- Contributions to open-source libraries like RCCL, Triton, or vLLM.
- Experience with GPU performance tools (Nsight, rocprof, perf) and memory/comms profiling.
- Prior experience deploying inference on AMD or other non-NVIDIA GPU environments.
- Knowledge of model/tensor parallelism, mixed precision, and serving 10B+ parameter models.