Posted in

Engineering Manager Machine Learning Infrastructure

Engineering Manager Machine Learning Infrastructure

CompanyByteDance
LocationSeattle, WA, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
Degrees
Experience LevelSenior, Expert or higher

Requirements

  • Experience in leading an engineering team
  • Experience in developing and deploying large-scale machine learning systems.
  • Strong sense of responsibility and good at communication and teamwork
  • Passionate about solving complex and challenging problems

Responsibilities

  • Lead the team to design and implement distributed inference/training/scheduling/orchestration/storage/parameter_server infrastructure for feeds, ads and search ranking models.
  • Oversee the development of monitoring and management tools to ensure the reliability and scalability of machine learning infra.
  • Manage the identification and prioritization of system inefficiencies and bottlenecks, leading efforts to enhance system performance.
  • Lead the team in creating tools to analyze bottlenecks and sources of instability, formulating and implementing effective solutions.
  • Collaborate with product teams, offering comprehensive solutions tailored to their specific requirements.

Preferred Qualifications

  • Experience contributing to an open sourced machine learning framework (tensorflow / jax / pytorch / torchscript / mxnet / tensorrt).
  • Experience in big data frameworks (e.g., Spark/Hadoop/Flink), experience in resource management and task scheduling for large scale distributed systems.
  • Participated in Parameter Server system optimization, or index structure optimization for search systems.
  • Strong background in one of the following fields: Hardware-Software Co-Design, High Performance Computing, ML Hardware Acceleration (e.g., GPU/RDMA) or ML for Systems.