Skip to content

Senior Cloud Ops Engineer
Company | GoFundMe |
---|
Location | San Francisco, CA, USA |
---|
Salary | $156000 – $234000 |
---|
Type | Full-Time |
---|
Degrees | Bachelor’s |
---|
Experience Level | Senior |
---|
Requirements
- Bachelor’s Degree in Computer Science, a related field, or 8+ years of equivalent practical experience.
- Minimum of 6 years of experience designing and managing scalable, cloud-based infrastructure, preferably in SaaS environments.
- Deep technical expertise with a strong foundation in computer science, sharp engineering skills, and a commitment to delivering high-quality solutions.
- Expert-level knowledge of AWS cloud services, container technologies like Docker and Kubernetes, and Infrastructure as Code (IaC) tools like Terraform and CloudFormation.
- Proficiency in software architecture, including asynchronous event-driven architecture and microservices.
- Experienced in performance and reliability testing using tools like Artillery, K6, or similar frameworks.
- Experience in defining, monitoring, and managing Service Level Indicators (SLIs) and Service Level Objectives (SLOs) to ensure the cloud infrastructure consistently meets performance and availability targets.
- Proven expertise in disaster recovery planning and execution, including developing and implementing robust strategies to maintain business continuity and achieve rapid recovery in the event of an outage.
- Hands-on experience with application performance management (APM) tools like New Relic, DataDog, and Splunk.
- Advanced scripting and development skills in Bash, PHP, and NodeJS languages.
- Skilled in managing distributed data systems, troubleshooting complex issues under high load, and designing for high transaction volumes.
- Knowledgeable in compliance regulations, including PCI, SOC2, and GDPR.
Responsibilities
- Design and implement robust, fault-tolerant cloud solutions to process billions of dollars annually, ensuring scalability, resilience, and compliance.
- Share expertise and foster a culture of continuous improvement, innovation, and learning within the team, contributing to technical mentorship and knowledge sharing.
- Participate in strategic decisions regarding cloud architecture, influencing the adoption of best practices and cutting-edge technologies.
- Work collaboratively to enhance system performance, observability, and reliability across the infrastructure, focusing on improving real-time monitoring and logging for operational excellence.
- Lead initiatives to improve infrastructure resiliency, leveraging tools like AWS Resilience Hub and Fault Injection Simulator to test and enhance system robustness.
- Drive application resilience by designing and executing load tests, simulating infrastructure faults, and analyzing results to improve fault tolerance.
- Incorporate scalability and performance testing as integral parts of service design, ensuring services meet reliability and performance goals under high transaction volumes.
- Embed testing phases within CI/CD pipelines to promote shift-left performance testing practices, improve efficiency, and reduce development cycle times.
- Contribute to implementing and analyzing DORA (DevOps Research and Assessment) metrics to enhance the efficiency and effectiveness of the development lifecycle.
- Participate in an on-call rotation to promptly address and resolve critical incidents, ensuring continuous operational excellence and rapid recovery during outages.
Preferred Qualifications
- AWS cloud certifications.
- Experience with fault-tolerant system design, large-scale distributed systems, and high-transaction environments.
- Familiarity with tools and processes for infrastructure resiliency and fault injection testing.