Senior / Staff ML Optimization Engineer

Company	Waabi
Location	Toronto, ON, Canada, San Francisco, CA, USA, Dallas, TX, USA
Salary	$Not Provided – $Not Provided
Type	Full-Time
Degrees	Bachelor’s, Master’s, PhD
Experience Level	Senior

MS/PhD or Bachelors degree with a minimum of 6 years of industry experience in Computer Science, Robotics and/or similar technical field(s) of study.
Solid coding proficiency in a variety of coding languages including Python, C++ or Rust.
Experience in deep learning frameworks such as PyTorch.
Experience across different stages of the development pipeline: data processing, distributed training and model deployment.
Skilled in profiling CPU and GPU code using tools such as PyTorch Profiler and NVIDIA Nsight.
Open-minded and collaborative team player with willingness to help others.
Passionate about self-driving technologies, solving hard problems, and creating innovative solutions.

Collaborate closely with autonomy and algorithm engineers to scale safe self-driving systems using an AI-first approach.
Build standardized distributed training frameworks for research and production, drive our training towards new levels of stability and efficiency.
Comprehensively profile model runtime and memory to pinpoint performance bottlenecks.
Identify and evaluate emerging technologies that can be adopted into Waabi’s training and inference frameworks, including but not limited to efficient CUDA kernels for training, quantization, model exporting and compilation for inference, etc.

Experience in model compilation and exporting, interaction with lower level concepts like TensorRT.
Experience in identifying when custom CUDA kernels are needed, and implementing them.
Experience in Bazel build systems, and integrating third party packages into dev environments.