Posted in

Machine Learning Performance Engineer

Machine Learning Performance Engineer

CompanyJane Street
LocationNew York, NY, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
Degrees
Experience LevelMid Level, Senior

Requirements

  • An understanding of modern ML techniques and toolsets
  • The experience and systems knowledge required to debug a training run’s performance end-to-end
  • Low-level GPU knowledge of PTX, SASS, warps, cooperative groups, Tensor Cores, and the memory hierarchy
  • Debugging and optimization experience using tools like CUDA GDB, Nsight Systems, Nsight Computesight-systems, and nsight-compute
  • Library knowledge of Triton, CUTLASS, CUB, Thrust, cuDNN, and cuBLAS
  • Intuition about the latency and throughput characteristics of CUDA graph launch, tensor core arithmetic, warp-level synchronization, and asynchronous memory loads
  • Background in Infiniband, RoCE, GPUDirect, PXN, rail optimization and NVLink, and how to use these networking technologies to link up GPU clusters
  • An understanding of the collective algorithms supporting distributed GPU training in NCCL or MPI
  • An inventive approach and the willingness to ask hard questions about whether we’re taking the right approaches and using the right tools

Responsibilities

  • Optimizing the performance of models—both training and inference
  • Improving straightforward CUDA
  • Ensuring the platform makes sense at the lowest level
  • Diving deep into market data
  • Tuning hyperparameters
  • Debugging distributed training performance
  • Studying how the model likes to trade in production

Preferred Qualifications

    No preferred qualifications provided.