Posted in

Incident Response Manager

Incident Response Manager

CompanyToyota
LocationPlano, TX, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
Degrees
Experience LevelSenior, Expert or higher

Requirements

  • 8+ years of experience managing major incidents in high-stakes, always-on environments.
  • Proven ability to lead multiple incidents simultaneously and influence diverse teams toward resolution.
  • Strong full-stack technical background, including cloud platforms like AWS or Azure.
  • Solid infrastructure knowledge—physical, virtual, and containerized systems.
  • Analytical skills with data tools (SQL, Dynatrace, or similar).
  • Calm under pressure with strong task management and decision-making skills.
  • Excellent communicator, able to explain technical issues clearly to all audiences.

Responsibilities

  • Lead Incident Response: Act as Incident Commander during major incidents, directing cross-functional teams to restore services swiftly.
  • Drive Technical Resolution: Leverage tools like Splunk, SQL, and cloud monitoring to identify root causes and guide remediation.
  • Own Communications: Deliver timely, clear updates to internal stakeholders and users during incidents.
  • Problem Management: Lead post-incident analysis (RCA) and drive long-term fixes to prevent repeat issues.
  • Automation: Build and deploy scripts (Python, Bash) to enhance detection, response, and reporting.
  • Data-Driven Insights: Analyze incident trends and response effectiveness to inform leadership and guide improvements.
  • Continuous Improvement: Partner with teams to refine tools, playbooks, and processes that enhance system resilience.
  • On-Call Leadership: Serve as an escalation point during critical outages.

Preferred Qualifications

  • Familiarity with ITIL processes (certification a plus).
  • Hands-on experience with scripting/automation (Python, Ruby, JavaScript, or shell).
  • Experience crafting user-facing incident comms (e.g., status pages, notifications).
  • Understanding of distributed systems and interdependent architecture.
  • A track record of improving incident response operations in high-availability environments.