Engineering Manager Machine Learning Infrastructure
Company | ByteDance |
---|---|
Location | Seattle, WA, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | |
Experience Level | Senior, Expert or higher |
Requirements
- Experience in leading an engineering team
- Experience in developing and deploying large-scale machine learning systems.
- Strong sense of responsibility and good at communication and teamwork
- Passionate about solving complex and challenging problems
Responsibilities
- Lead the team to design and implement distributed inference/training/scheduling/orchestration/storage/parameter_server infrastructure for feeds, ads and search ranking models.
- Oversee the development of monitoring and management tools to ensure the reliability and scalability of machine learning infra.
- Manage the identification and prioritization of system inefficiencies and bottlenecks, leading efforts to enhance system performance.
- Lead the team in creating tools to analyze bottlenecks and sources of instability, formulating and implementing effective solutions.
- Collaborate with product teams, offering comprehensive solutions tailored to their specific requirements.
Preferred Qualifications
- Experience contributing to an open sourced machine learning framework (tensorflow / jax / pytorch / torchscript / mxnet / tensorrt).
- Experience in big data frameworks (e.g., Spark/Hadoop/Flink), experience in resource management and task scheduling for large scale distributed systems.
- Participated in Parameter Server system optimization, or index structure optimization for search systems.
- Strong background in one of the following fields: Hardware-Software Co-Design, High Performance Computing, ML Hardware Acceleration (e.g., GPU/RDMA) or ML for Systems.