Principal Machine Learning Engineer - ML Inference Platform

Principal Machine Learning Engineer – ML Inference Platform

Company	Snap
Location	Palo Alto, CA, USA, Seattle, WA, USA, Los Angeles, CA, USA, Bellevue, WA, USA
Salary	$235000 – $414000
Type	Full-Time
Degrees	Bachelor’s, Master’s, PhD
Experience Level	Expert or higher

Strong understanding of machine learning approaches and algorithms
Excellent programming and software design skills, including debugging, performance analysis, and test design
Proven track record of operating highly-available systems at scale
Ability to proactively learn new concepts and technology and apply them at work
Skilled at solving ambiguous problems
Strong collaboration and mentorship skills
BS in technical field such as computer science, mathematics, statistics or equivalent years of experience
9+ years of post-Bachelor’s machine learning experience; or a Master’s degree in a technical field + 8+ year of post-grad ML experience; or a PhD in a related technical field + 5+ years of post-grad ML experience
2+ years of experience as a technical lead
Experience with GPU/TPU inference and optimizations

Design, implement, and scale critical machine learning components and services to support Snap’s most strategic initiatives
Design and build a next-generation inference framework and services that can support large-scale model, high-throughput serving, enabling us to push the limits of what’s possible with machine learning
Perform model and inference optimization with various GPUs to improve model inference speed and efficiency
Work across teams to understand product requirements, evaluate trade-offs, and deliver the solutions needed to build innovative products or services
Advocate for and apply best practices when it comes to availability, scalability, operational excellence, and cost management
Provide technical direction that influences the entire company

Masters/PhD in a technical field such as computer science
Experience leading teams and driving technical roadmaps
Experience working with machine learning, recommendation and ranking systems, or vector similarity search
Experience with TensorFlow, PyTorch, or related deep learning frameworks
Experience with Docker, Kubernetes, Ray, NoSQL solutions, Memcache/Redis, Google/AWS services
Experienced in MLOps and managing production machine learning lifecycle