Posted in

Software Engineer – Inference – GPU Enablement

Software Engineer – Inference – GPU Enablement

CompanyOpenAI
LocationSan Francisco, CA, USA
Salary$310000 – $460000
TypeFull-Time
Degrees
Experience LevelMid Level, Senior

Requirements

  • Have experience writing or porting GPU kernels using HIP, CUDA, or Triton, and care deeply about low-level performance.
  • Are familiar with communication libraries like NCCL/RCCL and understand their role in high-throughput model serving.
  • Have worked on distributed inference systems and are comfortable scaling models across fleets of accelerators.
  • Enjoy solving end-to-end performance challenges across hardware, system libraries, and orchestration layers.
  • Are excited to be part of a small, fast-moving team building new infrastructure from first principles.

Responsibilities

  • Design and optimize high-performance GPU kernels for AMD accelerators using HIP, Triton, or other performance-focused frameworks.
  • Build and tune collective communication libraries (e.g., RCCL) used to parallelize model execution across many GPUs.
  • Integrate internal model-serving infrastructure (e.g., vLLM, Triton) into AMD-backed systems.
  • Debug and optimize distributed inference workloads across memory, network, and compute layers.
  • Validate correctness, performance, and scalability of model execution on large AMD GPU clusters.

Preferred Qualifications

  • Contributions to open-source libraries like RCCL, Triton, or vLLM.
  • Experience with GPU performance tools (Nsight, rocprof, perf) and memory/comms profiling.
  • Prior experience deploying inference on AMD or other non-NVIDIA GPU environments.
  • Knowledge of model/tensor parallelism, mixed precision, and serving 10B+ parameter models.