Skip to content

Senior – Software Engineer – SRE/Dev Ops
Company | Walmart |
---|
Location | Bentonville, AR, USA |
---|
Salary | $90000 – $180000 |
---|
Type | Full-Time |
---|
Degrees | Bachelor’s, Master’s |
---|
Experience Level | Senior |
---|
Requirements
- Bachelor’s degree in Computer Science, Engineering or related discipline
- 3 years of hands-on related to SRE, Operations & Development experience with Java Script, Java, Restful services, Git, Maven, Jenkins, Dev Ops, Containerization, Docker, Kubernetes, Azure, Google cloud, Kafka, Azure Cosmos, Azure SQL, Mega cache CI/CD, Prometheus, Grafana, Splunk etc.
- Demonstrate knowledge of scripting and software development for automation and self-healing of multi-cloud environments
- Excellent end to end technical understanding of core infrastructure, cloud services, platforms, and micro-services
- Ability to effectively triage be able to detect and determine symptom vs cause
- Identify and drive continuous improvement efforts to reduce waste (eliminate, automate or streamline)
- Influence the design of system architecture and tactical solutions
- Familiar with log centric tooling. Produce time series data and reusable dashboards for use both during and post event.
Responsibilities
- Triage site-impacting production issues by quantifying impact, severity and urgency, analyzing systems for quick remediation, engaging the right teams for recovery, and focusing on immediate restoration of large-scale enterprise systems
- Detect and analyze monitoring graphs and alerts to identify systems causing production impacts with various tools
- Design and implement Java Script for the integration of alerting tool with service API endpoints
- Work with business partners to identify and document critical applications
- Participate in the design of a minimum operating environment for a computer-based facility
- Monitor site reliability conditions and new reliability requirements
- Develop enterprise monitoring and utilize tooling software solutions to improve visibility, pro-actively detect issues and restore system availability
- Design and develop solutions for widespread internal communications for cloud applications support or workflows for infrastructure availability issues
- Streamline the deployments process and handle the responsibility as a single team
- Participate in rotating on-call duties and work across different time zones with a multi-national team
- Responsible for timely root cause analysis of production issues
- Develop reusable tooling and processes to drive and improve customer experience and lower operational costs
- Help teams to build highly Observable and Resilient systems
- Collaborate with developers to capture requirements and understanding pain points
- Build reusable tools, library, dashboards which can be used across Dev Ops/SRE teams.
Preferred Qualifications
- Master’s degree in Computer Science, Computer Engineering, Computer Information Systems, Software Engineering, or related area and 1 year’s experience in software engineering or related area
- Background in creating inclusive digital experiences, demonstrating knowledge in implementing Web Content Accessibility Guidelines (WCAG) 2.2 AA standards, assistive technologies, and integrating digital accessibility seamlessly.