Posted in

Senior – Software Engineer – SRE/Dev Ops

Senior – Software Engineer – SRE/Dev Ops

CompanyWalmart
LocationBentonville, AR, USA
Salary$90000 – $180000
TypeFull-Time
DegreesBachelor’s, Master’s
Experience LevelSenior

Requirements

  • Bachelor’s degree in Computer Science, Engineering or related discipline
  • 3 years of hands-on related to SRE, Operations & Development experience with Java Script, Java, Restful services, Git, Maven, Jenkins, Dev Ops, Containerization, Docker, Kubernetes, Azure, Google cloud, Kafka, Azure Cosmos, Azure SQL, Mega cache CI/CD, Prometheus, Grafana, Splunk etc.
  • Demonstrate knowledge of scripting and software development for automation and self-healing of multi-cloud environments
  • Excellent end to end technical understanding of core infrastructure, cloud services, platforms, and micro-services
  • Ability to effectively triage be able to detect and determine symptom vs cause
  • Identify and drive continuous improvement efforts to reduce waste (eliminate, automate or streamline)
  • Influence the design of system architecture and tactical solutions
  • Familiar with log centric tooling. Produce time series data and reusable dashboards for use both during and post event.

Responsibilities

  • Triage site-impacting production issues by quantifying impact, severity and urgency, analyzing systems for quick remediation, engaging the right teams for recovery, and focusing on immediate restoration of large-scale enterprise systems
  • Detect and analyze monitoring graphs and alerts to identify systems causing production impacts with various tools
  • Design and implement Java Script for the integration of alerting tool with service API endpoints
  • Work with business partners to identify and document critical applications
  • Participate in the design of a minimum operating environment for a computer-based facility
  • Monitor site reliability conditions and new reliability requirements
  • Develop enterprise monitoring and utilize tooling software solutions to improve visibility, pro-actively detect issues and restore system availability
  • Design and develop solutions for widespread internal communications for cloud applications support or workflows for infrastructure availability issues
  • Streamline the deployments process and handle the responsibility as a single team
  • Participate in rotating on-call duties and work across different time zones with a multi-national team
  • Responsible for timely root cause analysis of production issues
  • Develop reusable tooling and processes to drive and improve customer experience and lower operational costs
  • Help teams to build highly Observable and Resilient systems
  • Collaborate with developers to capture requirements and understanding pain points
  • Build reusable tools, library, dashboards which can be used across Dev Ops/SRE teams.

Preferred Qualifications

  • Master’s degree in Computer Science, Computer Engineering, Computer Information Systems, Software Engineering, or related area and 1 year’s experience in software engineering or related area
  • Background in creating inclusive digital experiences, demonstrating knowledge in implementing Web Content Accessibility Guidelines (WCAG) 2.2 AA standards, assistive technologies, and integrating digital accessibility seamlessly.