AI System Research and Development Engineer – Frameworks
Company | Snowflake |
---|---|
Location | Menlo Park, CA, USA, Bellevue, WA, USA |
Salary | $195000 – $287500 |
Type | Full-Time |
Degrees | Bachelor’s, Master’s, PhD |
Experience Level | Senior |
Requirements
- 5 or more years of experience in deep learning frameworks, distributed systems, or high-performance computing (HPC).
- Bachelor’s degree in Computer Science, Electrical Engineering, or a related field. A Master’s degree or PhD is preferred.
- Expertise in distributed training frameworks (e.g., DeepSpeed, PyTorch DDP, FSDP, Megatron-LM).
- Strong understanding of modern parallelism techniques such as data, tensor, sequence, ZeRO-based parallelism.
- Programming language proficiency in Python and C++ or CUDA.
- Solid problem-solving skills and ability to debug complex performance issues.
- Excellent communication skills and ability to work effectively in a cross-functional team environment.
Responsibilities
- Solve large-scale challenges in data preprocessing, model training, and model evaluation.
- Develop and deploy state of the art tooling and open-source technologies to enhance the efficiency and effectiveness of AI solutions.
- Apply advanced optimization techniques to reduce resource requirements while maintaining model performance and ensuring usability for researchers, developers and customers.
- Stay updated with the latest advancements in LLM training and inference optimizations.
- Open-source and publish innovations, optimizations, and engineering practices in technical blogs, top-tier conferences and journals.
Preferred Qualifications
-
No preferred qualifications provided.