Skip to content

Lead Site Reliability Engineer
Company | General Dynamics |
---|
Location | Nashville, TN, USA |
---|
Salary | $144500 – $195500 |
---|
Type | Full-Time |
---|
Degrees | |
---|
Experience Level | Expert or higher |
---|
Requirements
- 10+ years of related experience
- 10+ years AWS infrastructure design and deployment
- 3+ years in an SRE role working in complex systems
- IaC background including CDK or CloudFormation
- Lead experience configuring and using logging and monitoring systems including CloudWatch, Splunk or Instana
- Ability to analyze infrastructure dependencies
- Experience overseeing infrastructure deployments including developing testing procedures
- Strong communication skills
- Ability to work with government stakeholders
- Prior experience in a cross-cutting SRE role
Responsibilities
- Develop a deep understanding of how systems inter-operate within the infrastructure, including upstream and downstream dependencies
- Review all AWS infrastructure deployments to identify upstream and downstream impacts and ensure test processes fully validate feature and integration
- Ensure that monitoring, logging, and alerting for services running in core infrastructure accounts are properly configured and provide actionable information
- Develop new monitoring solutions based on findings that can help in preventing future issues
- Develop metrics based on the SRE role and need to determine how the overall infrastructure is performing
- Participate in any Emergency Responses and provide Incident Response metrics
- In collaboration with government stakeholders, develop and maintain a logging and monitoring strategy for the infrastructure platform
- Conduct and coordinate 5 Y’s and other blameless post-mortem activities in the event of an incident
- Participate in continuous improvement activities such as technical debt analysis and contributing to the reliability standards and practices of the team
- Work with team DevOps engineers to improve deployment process and introduce automated testing
- Audit resources in accounts under your responsibility; identify areas for improvement or technical debt and collaborate with program and government partners to prioritize
- Assist the cloud infrastructure team and other teams in troubleshooting wide area integration issues
- Commit changes to our infrastructure codebase as necessary
Preferred Qualifications
- AWS Solutions Architect Professional or DevOps Engineer Professional Certification