Posted in

Lead Site Reliability Engineer

Lead Site Reliability Engineer

CompanyGeneral Dynamics
LocationNashville, TN, USA
Salary$144500 – $195500
TypeFull-Time
Degrees
Experience LevelExpert or higher

Requirements

  • 10+ years of related experience
  • 10+ years AWS infrastructure design and deployment
  • 3+ years in an SRE role working in complex systems
  • IaC background including CDK or CloudFormation
  • Lead experience configuring and using logging and monitoring systems including CloudWatch, Splunk or Instana
  • Ability to analyze infrastructure dependencies
  • Experience overseeing infrastructure deployments including developing testing procedures
  • Strong communication skills
  • Ability to work with government stakeholders
  • Prior experience in a cross-cutting SRE role

Responsibilities

  • Develop a deep understanding of how systems inter-operate within the infrastructure, including upstream and downstream dependencies
  • Review all AWS infrastructure deployments to identify upstream and downstream impacts and ensure test processes fully validate feature and integration
  • Ensure that monitoring, logging, and alerting for services running in core infrastructure accounts are properly configured and provide actionable information
  • Develop new monitoring solutions based on findings that can help in preventing future issues
  • Develop metrics based on the SRE role and need to determine how the overall infrastructure is performing
  • Participate in any Emergency Responses and provide Incident Response metrics
  • In collaboration with government stakeholders, develop and maintain a logging and monitoring strategy for the infrastructure platform
  • Conduct and coordinate 5 Y’s and other blameless post-mortem activities in the event of an incident
  • Participate in continuous improvement activities such as technical debt analysis and contributing to the reliability standards and practices of the team
  • Work with team DevOps engineers to improve deployment process and introduce automated testing
  • Audit resources in accounts under your responsibility; identify areas for improvement or technical debt and collaborate with program and government partners to prioritize
  • Assist the cloud infrastructure team and other teams in troubleshooting wide area integration issues
  • Commit changes to our infrastructure codebase as necessary

Preferred Qualifications

  • AWS Solutions Architect Professional or DevOps Engineer Professional Certification