Software Engineer - Systems ML - Frameworks / Compilers / Kernels

Software Engineer – Systems ML – Frameworks / Compilers / Kernels

Proven C/C++ programming skills
Currently has, or is in the process of obtaining a Bachelor’s degree in Computer Science, Computer Engineering, relevant technical field, or equivalent practical experience. Degree must be completed prior to joining Meta.
Experience in AI framework development or accelerating deep learning models on hardware architectures.

Development of SW stack with one of the following core focus areas: AI frameworks, compiler stack, high performance kernel development and acceleration onto next generation of hardware architectures.
Contribute to the development of the industry-leading PyTorch AI framework core compilers to support new state of the art inference and training AI hardware accelerators and optimize their performance.
Analyze deep learning networks, develop & implement compiler optimization algorithms.
Collaborating with AI research scientists to accelerate the next generation of deep learning models such as Recommendation systems, Generative AI, Computer vision, NLP etc.
Performance tuning and optimizations of deep learning framework & software components.

A Bachelor’s degree in Computer Science, Computer Engineering, relevant technical field and 4+ years of experience in AI framework development or accelerating deep learning models on hardware architectures OR a Master’s degree in Computer Science, Computer Engineering, relevant technical field and 2+ years of experience in AI framework development or accelerating deep learning models on hardware architectures OR a PhD in Computer Science Computer Engineering, or relevant technical field.
Knowledge of GPU, CPU, or AI hardware accelerator architectures.
Experience working with frameworks like PyTorch, Caffe2, TensorFlow, ONNX, TensorRT
Experience with CUDA programming, OpenMP / OpenCL programming or AI hardware accelerator kernel programming. Experience in accelerating libraries on AI hardware, similar to cuBLAS, cuDNN, CUTLASS, HIP, ROCm etc.
Experience with compiler optimizations such as loop optimizations, vectorization, parallelization, hardware specific optimizations such as SIMD. Experience with MLIR, LLVM, IREE, XLA, TVM, Halide is a plus.
Experience in developing training and inference framework components. Experience in system performance optimizations such as runtime analysis of latency, memory bandwidth, I/O access, compute utilization analysis and associated tooling development.