Research Scientist in ML Systems
Company | ByteDance |
---|---|
Location | San Jose, CA, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Master’s |
Experience Level | Senior, Expert or higher |
Requirements
- Master or above degree in distributed, parallel computing principles and know the recent advances in computing, storage, networking, and hardware technologies
- Familiar with machine learning algorithms and platforms
- Have basic understanding of how GPU, FPGA, ASIC works
- Expert in at least one or two programming languages in Linux environment: C/C++, CUDA, Python
Responsibilities
- Research and develop our machine learning systems, including heterogeneous computing architecture, management, and monitoring
- Deploy the machine learning systems, distributed task scheduling, machine learning training, and machine learning inference
- Manage cross-layer optimization of system and AI algorithms and hardware for machine learning (GPU, FPGA, ASIC)
Preferred Qualifications
- GPU based high performance computing, RDMA high performance network (NCCL)
- Tensorflow, Jax, PyTorch or other deep learning frameworks
- Large scale data processing and parallel computing
- Experiences in designing and operating large scale systems in cloud computing or machine learning