Posted in

AI System Research and Development Engineer – Frameworks

AI System Research and Development Engineer – Frameworks

CompanySnowflake
LocationMenlo Park, CA, USA, Bellevue, WA, USA
Salary$195000 – $287500
TypeFull-Time
DegreesBachelor’s, Master’s, PhD
Experience LevelSenior

Requirements

  • 5 or more years of experience in deep learning frameworks, distributed systems, or high-performance computing (HPC).
  • Bachelor’s degree in Computer Science, Electrical Engineering, or a related field. A Master’s degree or PhD is preferred.
  • Expertise in distributed training frameworks (e.g., DeepSpeed, PyTorch DDP, FSDP, Megatron-LM).
  • Strong understanding of modern parallelism techniques such as data, tensor, sequence, ZeRO-based parallelism.
  • Programming language proficiency in Python and C++ or CUDA.
  • Solid problem-solving skills and ability to debug complex performance issues.
  • Excellent communication skills and ability to work effectively in a cross-functional team environment.

Responsibilities

  • Solve large-scale challenges in data preprocessing, model training, and model evaluation.
  • Develop and deploy state of the art tooling and open-source technologies to enhance the efficiency and effectiveness of AI solutions.
  • Apply advanced optimization techniques to reduce resource requirements while maintaining model performance and ensuring usability for researchers, developers and customers.
  • Stay updated with the latest advancements in LLM training and inference optimizations.
  • Open-source and publish innovations, optimizations, and engineering practices in technical blogs, top-tier conferences and journals.

Preferred Qualifications

    No preferred qualifications provided.