Senior Site Reliability Engineer – Data Infrastructure
Company | ByteDance |
---|---|
Location | Seattle, WA, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Bachelor’s |
Experience Level | Senior |
Requirements
- Bachelor’s degree in Computer Science or a related technical field with 5+ years of experience
- Experience programming in one of the following Languages: C, C++, Java, Python, Go, and Rust
- Familiar with Unix/Linux system internals, networking, and distributed systems
Responsibilities
- Participate in and enhance the complete service lifecycle, from inception and design, through development, capacity planning, launch reviews, deployment, operation, and refinement.
- Design and implement software platforms and monitoring frameworks to govern service-oriented architecture (SOA) efficiently, automatically, and intelligently.
- Develop and manage components of cloud-managed data infrastructure, encompassing technologies such as Kubernetes, Redis, MySQL, Flink, and more.
- Establish sustainable mechanisms for scaling systems, such as automation, to drive enhancements in reliability, efficiency, and velocity.
- Provide sustainable user support, manage incident responses, and conduct blameless postmortems as part of our ongoing efforts to improve our systems.
- Design and implement strategic solutions for optimal resource utilization and budget alignment, as well as integrate these solutions into product tools to drive cost reduction and enhance automated platform capabilities.
- Responsible for designing stability solutions, managing technical issues, optimizing system operations, using product tools for scalable solutions, and understanding industry trends to provide personalized stability solutions.
Preferred Qualifications
- Experience in MySQL, Redis, Ngnix, Kubernetes, Docker, OpenStack, Hadoop, Spark, Flink, etc.
- Experience in designing and analyzing large-scale distributed systems
- Expertise in cost optimization, resource management, infrastructure planning, a systematic and product-oriented mindset, and proven abilities in project management as well as data structure and analysis
- Hands-on experience in ensuring system stability, optimizing complex processes, implementing high availability architectures, promoting stability awareness, and having previously built and operated a stability assurance system from the ground up. This system should highlight process formalization, standardization, usage of tooling, and continuous improvements
- Strong skills in problem-solving and communication