Posted in

Senior Site Reliability Engineer – Data Infrastructure

Senior Site Reliability Engineer – Data Infrastructure

CompanyByteDance
LocationSeattle, WA, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
DegreesBachelor’s
Experience LevelSenior

Requirements

  • Bachelor’s degree in Computer Science or a related technical field with 5+ years of experience
  • Experience programming in one of the following Languages: C, C++, Java, Python, Go, and Rust
  • Familiar with Unix/Linux system internals, networking, and distributed systems

Responsibilities

  • Participate in and enhance the complete service lifecycle, from inception and design, through development, capacity planning, launch reviews, deployment, operation, and refinement.
  • Design and implement software platforms and monitoring frameworks to govern service-oriented architecture (SOA) efficiently, automatically, and intelligently.
  • Develop and manage components of cloud-managed data infrastructure, encompassing technologies such as Kubernetes, Redis, MySQL, Flink, and more.
  • Establish sustainable mechanisms for scaling systems, such as automation, to drive enhancements in reliability, efficiency, and velocity.
  • Provide sustainable user support, manage incident responses, and conduct blameless postmortems as part of our ongoing efforts to improve our systems.
  • Design and implement strategic solutions for optimal resource utilization and budget alignment, as well as integrate these solutions into product tools to drive cost reduction and enhance automated platform capabilities.
  • Responsible for designing stability solutions, managing technical issues, optimizing system operations, using product tools for scalable solutions, and understanding industry trends to provide personalized stability solutions.

Preferred Qualifications

  • Experience in MySQL, Redis, Ngnix, Kubernetes, Docker, OpenStack, Hadoop, Spark, Flink, etc.
  • Experience in designing and analyzing large-scale distributed systems
  • Expertise in cost optimization, resource management, infrastructure planning, a systematic and product-oriented mindset, and proven abilities in project management as well as data structure and analysis
  • Hands-on experience in ensuring system stability, optimizing complex processes, implementing high availability architectures, promoting stability awareness, and having previously built and operated a stability assurance system from the ground up. This system should highlight process formalization, standardization, usage of tooling, and continuous improvements
  • Strong skills in problem-solving and communication