Posted in

Software Engineer Graduate – Applied Machine Learning – Orchestration – PhD

Software Engineer Graduate – Applied Machine Learning – Orchestration – PhD

CompanyByteDance
LocationSan Jose, CA, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
DegreesPhD
Experience LevelEntry Level/New Grad

Requirements

  • Currently pursuing a PhD in Software Development, Computer Science, Computer Engineering, or a related technical discipline
  • Proficient in C/C++/Python/Golang, and have solid programming skills (e.g. algorithms and data structures)
  • Familiar with deep learning frameworks (TensorFlow/Pytorch)
  • Ability to work independently and complete projects from beginning to end and in a timely manner
  • Good communication and teamwork skills to clearly communicate technical concepts with other teammates.

Responsibilities

  • Responsible for the design and implementation of a global-scale machine learning system for feeds, ads and search ranking models
  • Responsible for improving the machine learning infrastructure’s usability, flexibility, and efficiency
  • Responsible for improving the workflow of model training and serving, data pipelines and resource management for multi-tenancy machine learning systems
  • Responsible for designing and developing key components of ML infrastructure

Preferred Qualifications

  • Experience contributing to an open sourced machine learning framework (TensorFlow/PyTorch), experience on improving core machine learning infrastructure
  • Experience in big data orchestration frameworks (e.g., K8s/Spark/Hadoop/Flink), experience in resource management and task scheduling for large scale distributed systems, experience in building solutions with AWS, GCP, Azures, OCI, AliCloud or other cloud services.
  • Strong background in one of the following fields: Hardware-Software Co-Design, High Performance Computing, ML Hardware Acceleration (e.g., GPU/TPU/RDMA) or ML for Systems
  • Experience in developing and deploying large-scale systems (e.g. Monitoring, Analyzing, Troubleshooting, and Notification systems), strong understanding of code optimizing, routine task automation and failure self-healing, familiar with IaC technologies like Terraform/Ansible.