Posted in

Senior Site Reliability Engineer – Applied Machine Learning

Senior Site Reliability Engineer – Applied Machine Learning

CompanyByteDance
LocationSan Jose, CA, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
DegreesBachelor’s, Master’s
Experience LevelSenior

Requirements

  • Expertise in analyzing and troubleshooting distributed systems.
  • Bachelor/Master’s degree in Computer Science, a related technical field involving software development or systems engineering.
  • Experience programming in at least one of the following languages: Python, C/C++ or Go.
  • With solid background of algorithms and data structures.

Responsibilities

  • Site Reliability Engineering (SRE) of AML (Applied Machine Learning) team combines system engineering and the art of machine learning to develop and run massively distributed AI/recommendation system around the world.
  • On the SRE team, you’ll have the opportunity to sharpen your expertise in coding, performance analysis and large system operation, and get heavily involved in the process of hardware/capacity decision-making.
  • SRE ensures that the very centric machine learning services at ByteDance have the highest level of availability, as well as creating highly automated systems and pipelines.

Preferred Qualifications

  • Ability to design and maintain large-scale systems.
  • Strong understanding of code optimizing and routine tasks automation.
  • SRE experience on large scale distributed system.