Site Reliability Engineer III
Company | JP Morgan Chase |
---|---|
Location | Chicago, IL, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Bachelor’s |
Experience Level | Mid Level, Senior |
Requirements
- Bachelor’s degree in computer science.
- Formal training or certification on software engineering concepts and 3+ years of experience or equivalent expertise troubleshooting, resolving, and maintaining information technology services.
- Demonstrated knowledge of applications or infrastructure in a large-scale technology environment both on premises and public cloud, with specific experience in public cloud AWS.
- Experience in observability and monitoring tools and techniques.
- Experience with cloud platforms AWS and their integration with Datadog.
- Exposure to processes in scope of the Information Technology Infrastructure Library (ITIL) framework.
- Proficiency in AWS services such as EC2, S3, RDS, Lambda, VPC, CloudFormation, etc.
- Experience with monitoring and logging tools like CloudWatch, AWS X-Ray, and third-party solutions.
- Understanding of networking concepts, including DNS, TCP/IP, VPN, and load balancing.
- Familiarity with DevOps practices and tools like Jenkins, Docker, Kubernetes, and CI/CD pipelines.
- Knowledge of scripting languages such as Python, Bash, or PowerShell.
- Experience with database management and SQL.
- Experience on Nodejs scripting language.
- Hands-on experience on Datadog with ability to create custom dashboards.
- AWS Certified Solutions Architect or AWS Certified SysOps Administrator certifications.
Responsibilities
- Provides end-to-end application or infrastructure service delivery to enable successful business operations of the firm.
- Collaborate across partner teams to establish and maintain Service Level Objective (SLO), Service Level Indicator (SLI), and Error Budget for key Production services and proactively resolve issues before they impact customers.
- Performs essential day-to-day duties around Incident, Problem (RCA), Change Event (monitoring/alerting) management.
- Supports the day-to-day maintenance of the firm’s systems to ensure operational stability and availability.
- Assist in the monitoring of production environments for anomalies and address issues utilizing standard observability tools.
- Develop and maintain alerting mechanisms to promptly detect and respond to incidents. Collaborate with cross-functional teams to resolve issues and minimize downtime.
- Design and implement Datadog instrumentation strategies to monitor application performance, infrastructure health, and user experience. Ensure comprehensive coverage and accurate data collection.
- Identify issues for escalation and communication and provide solutions to the business and technology stakeholders.
- Analyze complex situations and trends to anticipate and solve incident, problem, and change management in support of full stack technology systems, applications, or infrastructure.
Preferred Qualifications
- Experience with one or more general-purpose programming languages and/or automation scripting.
- Possess good working understanding of public cloud with AWS Certification as a plus.
- Experience working in banking and payment domains such as: FX Cross Currency, High and low value payment, SWIFT, Real-time payment, collection, check clearing, tax payment, partner bank system.
- Collaborate well and possess the ability to foster relationships effectively with diverse groups across geographies.
- Ability to multi-task, handling complex requirements and is adaptable.
- AWS Certified Developer preferred.