Lead Software Engineer

Company	The Walt Disney Company
Location	Seattle, WA, USA, Burbank, CA, USA, Celebration, FL, USA
Salary	$152200 – $213900
Type	Full-Time
Degrees
Experience Level	Senior, Expert or higher

Requirements

7+ years of experience in software development, with at least 3 years focused on service reliability, data management, and distributed systems.
Expert-level coding skills in Python, with a deep understanding of performance and resource optimization, including considerations of space and time complexity.
Strong experience with AWS services such as Lambda, ECS Fargate, S3, VPC, Kinesis, EventBridge, and related components.
Proven experience in working with serverless (AWS Lambda) and containerized environments (Docker, ECS, EKS).
Proficiency with IaC tools such as Terraform and AWS CDK to automate infrastructure management.
Extensive experience in CI/CD pipelines, automated testing, and general DevOps practices for continuous integration and deployment.
Hands-on experience with observability and monitoring tools such as DataDog, AppDynamics, New Relic, or similar suites.
Strong proponent of Agile methodologies and application of DevOps principles for continuous improvement.

Responsibilities

Establish key service-level indicators (SLIs) and continuously monitor them to ensure system reliability, availability, and performance. Proactively develop alerts and automated responses to prevent service degradation or outages.
Drive complex projects across multiple teams and disciplines, ensuring high availability, resilience, and minimal downtime of services.
Conduct deep analysis of system issues, identify root causes, and define actionable strategies for remediation. Lead post-mortem analysis and continuous improvement efforts.
Focus on the high availability, scalability, and performance of services in production environments, ensuring they meet business and customer needs.
Lead efforts to ensure that services are properly scaled for current and future workloads. Engage in capacity planning to optimize resource utilization.
Maintain detailed documentation and create robust runbooks for incident management and troubleshooting, ensuring smooth responses to service disruptions.
Utilize tools such as Terraform and AWS CDK to manage and automate infrastructure as code (IaC) in a cloud-native environment.
Oversee observability and monitoring across the platform (AWS, serverless, containers, Snowflake, etc.), ensuring actionable insights are available for operational teams.
Work closely with developers to ensure that applications are designed for service reliability, scalability, and maintainability.
Drive infrastructure changes and service deployment using GitOps practices to ensure consistency and traceability in deployments.
Write clean, performant, and well-documented application code with a focus on reliability and service availability.
Build and maintain automated deployment pipelines and tooling for monitoring and platform testing.
Provide technical guidance and mentorship to junior engineers and assist in technical decision-making and code reviews.

Preferred Qualifications

Certifications in AWS, Snowflake, or relevant service reliability tools.
In-depth experience with observability suites and Application Performance Management (APM) tooling such as DataDog, AppDynamics, New Relic, etc.