Lead Software Engineer
Company | The Walt Disney Company |
---|---|
Location | Seattle, WA, USA, Burbank, CA, USA, Celebration, FL, USA |
Salary | $152200 – $213900 |
Type | Full-Time |
Degrees | |
Experience Level | Senior, Expert or higher |
Requirements
- 7+ years of experience in software development, with at least 3 years focused on service reliability, data management, and distributed systems.
- Expert-level coding skills in Python, with a deep understanding of performance and resource optimization, including considerations of space and time complexity.
- Strong experience with AWS services such as Lambda, ECS Fargate, S3, VPC, Kinesis, EventBridge, and related components.
- Proven experience in working with serverless (AWS Lambda) and containerized environments (Docker, ECS, EKS).
- Proficiency with IaC tools such as Terraform and AWS CDK to automate infrastructure management.
- Extensive experience in CI/CD pipelines, automated testing, and general DevOps practices for continuous integration and deployment.
- Hands-on experience with observability and monitoring tools such as DataDog, AppDynamics, New Relic, or similar suites.
- Strong proponent of Agile methodologies and application of DevOps principles for continuous improvement.
Responsibilities
- Establish key service-level indicators (SLIs) and continuously monitor them to ensure system reliability, availability, and performance. Proactively develop alerts and automated responses to prevent service degradation or outages.
- Drive complex projects across multiple teams and disciplines, ensuring high availability, resilience, and minimal downtime of services.
- Conduct deep analysis of system issues, identify root causes, and define actionable strategies for remediation. Lead post-mortem analysis and continuous improvement efforts.
- Focus on the high availability, scalability, and performance of services in production environments, ensuring they meet business and customer needs.
- Lead efforts to ensure that services are properly scaled for current and future workloads. Engage in capacity planning to optimize resource utilization.
- Maintain detailed documentation and create robust runbooks for incident management and troubleshooting, ensuring smooth responses to service disruptions.
- Utilize tools such as Terraform and AWS CDK to manage and automate infrastructure as code (IaC) in a cloud-native environment.
- Oversee observability and monitoring across the platform (AWS, serverless, containers, Snowflake, etc.), ensuring actionable insights are available for operational teams.
- Work closely with developers to ensure that applications are designed for service reliability, scalability, and maintainability.
- Drive infrastructure changes and service deployment using GitOps practices to ensure consistency and traceability in deployments.
- Write clean, performant, and well-documented application code with a focus on reliability and service availability.
- Build and maintain automated deployment pipelines and tooling for monitoring and platform testing.
- Provide technical guidance and mentorship to junior engineers and assist in technical decision-making and code reviews.
Preferred Qualifications
- Certifications in AWS, Snowflake, or relevant service reliability tools.
- In-depth experience with observability suites and Application Performance Management (APM) tooling such as DataDog, AppDynamics, New Relic, etc.