Posted in

Software Engineer – ML System Architecture

Software Engineer – ML System Architecture

CompanyByteDance
LocationSeattle, WA, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
Degrees
Experience LevelMid Level, Senior

Requirements

  • Be proficient in 1 to 2 programming languages such as C++/Go/Python/Shell in Linux environment
  • Understand the principles of distributed systems and have experience in design, development and maintenance of large-scale machine learning systems
  • Be familiar with Kubernetes architecture, and have rich experience in system-level development and tuning
  • Have an excellent logical analysis ability, able to reasonably abstract and split business logic
  • Have a strong sense of responsibility, good learning ability, communication skills and self-drive

Responsibilities

  • Responsible for the design and development of Machine Learning infrastructure for LLM/AIGC, etc
  • Build up a super large machine learning system integrating GPUs, RDMA networking, and high-performance storage
  • Responsible for solving technical problems such as high stability and availability of the system
  • Organize and coordinate multiple teams to complete the construction of the system, including Data center team, network team, computing team, storage team, resource team.

Preferred Qualifications

  • Familiar with the ML Infrastructure of Large Model training and inference
  • Experience in one of the following fields: AI Infrastructure, HW/SW Co-Design, High Performance Computing, ML Hardware Architecture (GPU, Accelerators, Networking)