Posted in

Research Scientist in ML Systems

Research Scientist in ML Systems

CompanyByteDance
LocationSan Jose, CA, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
DegreesMaster’s
Experience LevelSenior, Expert or higher

Requirements

  • Master or above degree in distributed, parallel computing principles and know the recent advances in computing, storage, networking, and hardware technologies
  • Familiar with machine learning algorithms and platforms
  • Have basic understanding of how GPU, FPGA, ASIC works
  • Expert in at least one or two programming languages in Linux environment: C/C++, CUDA, Python

Responsibilities

  • Research and develop our machine learning systems, including heterogeneous computing architecture, management, and monitoring
  • Deploy the machine learning systems, distributed task scheduling, machine learning training, and machine learning inference
  • Manage cross-layer optimization of system and AI algorithms and hardware for machine learning (GPU, FPGA, ASIC)

Preferred Qualifications

  • GPU based high performance computing, RDMA high performance network (NCCL)
  • Tensorflow, Jax, PyTorch or other deep learning frameworks
  • Large scale data processing and parallel computing
  • Experiences in designing and operating large scale systems in cloud computing or machine learning