Principal Site Reliability Engineer - Prisma Access

Principal Site Reliability Engineer – Prisma Access

Must be a US Citizen to be considered
7+ years of experience in Infrastructure, SRE, or DevOps roles required
BS or MS in Computer Science, a related field, or equivalent professional experience required or equivalent military experience required
4+ years of experience with AWS and GCP and expertise in their architecture, services, advanced cloud networking, and PKI concepts
Expertise in troubleshooting and resolving cloud infrastructure and service issues, identifying root cause and devising effective solutions for high volume transactions
Proficiency with Python and shell scripting for automation; Golang is a plus
Proficiency in Infrastructure as Code (IaC) with Terraform and Helm, leveraging AI tools for development
Solid experience with Kubernetes, container networking, and container workloads
Strong Linux administration skills
Proficiency with CI/CD pipelines, GitOps principles, GitLab, and Jenkins
Excellent written and verbal communication skills, with the ability to collaborate effectively and rally support across teams
Self-disciplined, self-managed, and highly driven with a strong sense of ownership and urgency
Ability to adapt quickly to evolving cloud technologies, security threats, and advancements through continuous learning
Able to understand and address customer needs effectively, and provide RCA to customers
Understanding how technical decisions impact the business and aligning cloud operations with business goals

Design, build, and operate reliable, secure Cloud infrastructure across multi-cloud environments
Ensure applications are production-ready, scalable, and resilient, collaborating closely with developers, researchers, data scientists, and security experts
Develop expertise in new technologies and rapidly integrate them into our existing infrastructure, embracing continuous learning and the adoption of AI tools
Develop tools and automation frameworks, championing Infrastructure as Code (IaC) and Monitoring as Code (MaC) principles
Automate robust deployments and orchestrate end-to-end monitoring and alerting solutions
Participate in on-call rotations with SRE and Dev teams to support critical business and production systems
Lead root cause analysis of critical business and production issues, driving improvements and preventing recurrence
Contribute to the success of SRE and DevOps initiatives, aligning technical decisions with business goals and understanding their impact

No preferred qualifications provided.