Site Reliability Engineer – SRE
Company | QuEra Computing |
---|---|
Location | Boston, MA, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Bachelor’s |
Experience Level | Expert or higher |
Requirements
- Bachelors degree in Software Engineering or Software Development
- 8+ years of experience as an SRE, DevOps Engineer, or Systems Engineer
- Strong expertise in Kubernetes (TalOS preferred), cloud platforms (AWS, GCP, Azure), and Linux
- Hands-on experience with monitoring, logging, and incident management tools
- Proficiency in Python, Bash, or Go for scripting and automation
- Experience with building and maintaining lab environments, including physical and virtual infrastructure
- Solid knowledge of networking, distributed systems, and performance optimization
- Familiarity with CI/CD workflows and Infrastructure as Code practices
- Strong communication skills and ability to work cross-functionally
Responsibilities
- Design, build, and maintain resilient infrastructure across cloud and Kubernetes (TalOS-based) environments
- Build and maintain lab infrastructure for development, testing, and validation, including networking, hardware integration, and automation
- Define and monitor SLIs, SLOs, and error budgets to guide reliability efforts
- Develop automation tools and scripts in Python, Bash, or Go to reduce manual toil and improve system operations
- Improve observability using Prometheus, Grafana, OpenTelemetry, and other monitoring/logging solutions
- Manage incident response, perform root cause analysis, and lead postmortem processes
- Optimize systems for performance, scalability, and fault tolerance
- Contribute to infrastructure as code (IaC) using Terraform, Ansible, or Helm
- Collaborate with engineering teams to ensure systems are designed for operational excellence
Preferred Qualifications
- Experience in optical systems (e.g., optical networking, photonic devices)
- Exposure to or interest in quantum computing platforms and environments