Software Engineer - Inference - GPU Enablement

Software Engineer – Inference – GPU Enablement

Have experience writing or porting GPU kernels using HIP, CUDA, or Triton, and care deeply about low-level performance.
Are familiar with communication libraries like NCCL/RCCL and understand their role in high-throughput model serving.
Have worked on distributed inference systems and are comfortable scaling models across fleets of accelerators.
Enjoy solving end-to-end performance challenges across hardware, system libraries, and orchestration layers.
Are excited to be part of a small, fast-moving team building new infrastructure from first principles.

Design and optimize high-performance GPU kernels for AMD accelerators using HIP, Triton, or other performance-focused frameworks.
Build and tune collective communication libraries (e.g., RCCL) used to parallelize model execution across many GPUs.
Integrate internal model-serving infrastructure (e.g., vLLM, Triton) into AMD-backed systems.
Debug and optimize distributed inference workloads across memory, network, and compute layers.
Validate correctness, performance, and scalability of model execution on large AMD GPU clusters.

Contributions to open-source libraries like RCCL, Triton, or vLLM.
Experience with GPU performance tools (Nsight, rocprof, perf) and memory/comms profiling.
Prior experience deploying inference on AMD or other non-NVIDIA GPU environments.
Knowledge of model/tensor parallelism, mixed precision, and serving 10B+ parameter models.