Senior Site Reliability Engineer – Applied Machine Learning
Company | ByteDance |
---|---|
Location | San Jose, CA, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Bachelor’s, Master’s |
Experience Level | Senior |
Requirements
- Expertise in analyzing and troubleshooting distributed systems.
- Bachelor/Master’s degree in Computer Science, a related technical field involving software development or systems engineering.
- Experience programming in at least one of the following languages: Python, C/C++ or Go.
- With solid background of algorithms and data structures.
Responsibilities
- Site Reliability Engineering (SRE) of AML (Applied Machine Learning) team combines system engineering and the art of machine learning to develop and run massively distributed AI/recommendation system around the world.
- On the SRE team, you’ll have the opportunity to sharpen your expertise in coding, performance analysis and large system operation, and get heavily involved in the process of hardware/capacity decision-making.
- SRE ensures that the very centric machine learning services at ByteDance have the highest level of availability, as well as creating highly automated systems and pipelines.
Preferred Qualifications
- Ability to design and maintain large-scale systems.
- Strong understanding of code optimizing and routine tasks automation.
- SRE experience on large scale distributed system.