Posted in

Lead Software Engineer

Lead Software Engineer

CompanyThe Walt Disney Company
LocationSeattle, WA, USA, Burbank, CA, USA, Celebration, FL, USA
Salary$152200 – $213900
TypeFull-Time
Degrees
Experience LevelSenior, Expert or higher

Requirements

  • 7+ years of experience in software development, with at least 3 years focused on service reliability, data management, and distributed systems.
  • Expert-level coding skills in Python, with a deep understanding of performance and resource optimization, including considerations of space and time complexity.
  • Strong experience with AWS services such as Lambda, ECS Fargate, S3, VPC, Kinesis, EventBridge, and related components.
  • Proven experience in working with serverless (AWS Lambda) and containerized environments (Docker, ECS, EKS).
  • Proficiency with IaC tools such as Terraform and AWS CDK to automate infrastructure management.
  • Extensive experience in CI/CD pipelines, automated testing, and general DevOps practices for continuous integration and deployment.
  • Hands-on experience with observability and monitoring tools such as DataDog, AppDynamics, New Relic, or similar suites.
  • Strong proponent of Agile methodologies and application of DevOps principles for continuous improvement.

Responsibilities

  • Establish key service-level indicators (SLIs) and continuously monitor them to ensure system reliability, availability, and performance. Proactively develop alerts and automated responses to prevent service degradation or outages.
  • Drive complex projects across multiple teams and disciplines, ensuring high availability, resilience, and minimal downtime of services.
  • Conduct deep analysis of system issues, identify root causes, and define actionable strategies for remediation. Lead post-mortem analysis and continuous improvement efforts.
  • Focus on the high availability, scalability, and performance of services in production environments, ensuring they meet business and customer needs.
  • Lead efforts to ensure that services are properly scaled for current and future workloads. Engage in capacity planning to optimize resource utilization.
  • Maintain detailed documentation and create robust runbooks for incident management and troubleshooting, ensuring smooth responses to service disruptions.
  • Utilize tools such as Terraform and AWS CDK to manage and automate infrastructure as code (IaC) in a cloud-native environment.
  • Oversee observability and monitoring across the platform (AWS, serverless, containers, Snowflake, etc.), ensuring actionable insights are available for operational teams.
  • Work closely with developers to ensure that applications are designed for service reliability, scalability, and maintainability.
  • Drive infrastructure changes and service deployment using GitOps practices to ensure consistency and traceability in deployments.
  • Write clean, performant, and well-documented application code with a focus on reliability and service availability.
  • Build and maintain automated deployment pipelines and tooling for monitoring and platform testing.
  • Provide technical guidance and mentorship to junior engineers and assist in technical decision-making and code reviews.

Preferred Qualifications

  • Certifications in AWS, Snowflake, or relevant service reliability tools.
  • In-depth experience with observability suites and Application Performance Management (APM) tooling such as DataDog, AppDynamics, New Relic, etc.