Posted in

Machine Learning Performance Engineer

Machine Learning Performance Engineer

CompanyJane Street
LocationNew York, NY, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
Degrees
Experience LevelMid Level, Senior

Requirements

  • An understanding of modern ML techniques and toolsets
  • The experience and systems knowledge required to debug a training run’s performance end to end
  • Low-level GPU knowledge of PTX, SASS, warps, cooperative groups, Tensor Cores and the memory hierarchy
  • Debugging and optimisation experience using tools like CUDA GDB, NSight Systems, NSight Computesight-systems and nsight-compute
  • Library knowledge of Triton, CUTLASS, CUB, Thrust, cuDNN and cuBLAS
  • Intuition about the latency and throughput characteristics of CUDA graph launch, tensor core arithmetic, warp-level synchronization and asynchronous memory loads
  • Background in Infiniband, RoCE, GPUDirect, PXN, rail optimisation and NVLink, and how to use these networking technologies to link up GPU clusters
  • An understanding of the collective algorithms supporting distributed GPU training in NCCL or MPI
  • Fluency in English

Responsibilities

  • Optimising the performance of our models – both training and inference

Preferred Qualifications

    No preferred qualifications provided.