Software Engineer Graduate – Applied Machine Learning – Orchestration – PhD
Company | ByteDance |
---|---|
Location | San Jose, CA, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | PhD |
Experience Level | Entry Level/New Grad |
Requirements
- Currently pursuing a PhD in Software Development, Computer Science, Computer Engineering, or a related technical discipline
- Proficient in C/C++/Python/Golang, and have solid programming skills (e.g. algorithms and data structures)
- Familiar with deep learning frameworks (TensorFlow/Pytorch)
- Ability to work independently and complete projects from beginning to end and in a timely manner
- Good communication and teamwork skills to clearly communicate technical concepts with other teammates.
Responsibilities
- Responsible for the design and implementation of a global-scale machine learning system for feeds, ads and search ranking models
- Responsible for improving the machine learning infrastructure’s usability, flexibility, and efficiency
- Responsible for improving the workflow of model training and serving, data pipelines and resource management for multi-tenancy machine learning systems
- Responsible for designing and developing key components of ML infrastructure
Preferred Qualifications
- Experience contributing to an open sourced machine learning framework (TensorFlow/PyTorch), experience on improving core machine learning infrastructure
- Experience in big data orchestration frameworks (e.g., K8s/Spark/Hadoop/Flink), experience in resource management and task scheduling for large scale distributed systems, experience in building solutions with AWS, GCP, Azures, OCI, AliCloud or other cloud services.
- Strong background in one of the following fields: Hardware-Software Co-Design, High Performance Computing, ML Hardware Acceleration (e.g., GPU/TPU/RDMA) or ML for Systems
- Experience in developing and deploying large-scale systems (e.g. Monitoring, Analyzing, Troubleshooting, and Notification systems), strong understanding of code optimizing, routine task automation and failure self-healing, familiar with IaC technologies like Terraform/Ansible.