Principal Performance Engineer – Cortex Cloud
Company | Palo Alto Networks |
---|---|
Location | Santa Clara, CA, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Bachelor’s, Master’s |
Experience Level | Expert or higher |
Requirements
- 10+ years of experience in software engineering or performance engineering, with a strong focus on testing and optimizing distributed cloud-native systems
- Proven track record in building and executing performance testing strategies in complex, large-scale environments
- Strong programming and scripting skills in Python (preferred), along with experience using performance tools like JMeter, Locust, or similar
- Expertise in performance profiling, diagnostics, and tuning across microservices architectures
- Deep understanding of cloud platforms (AWS, GCP, Azure), Kubernetes orchestration, and cloud-native service architectures
- Hands-on experience with observability and monitoring tools such as Prometheus, Grafana, OpenTelemetry etc
- Strong knowledge of CI/CD systems and infrastructure automation, including Gitlab, Jenkins or similar
- Excellent analytical, debugging, and troubleshooting skills at the system and application levels
- Exceptional communication skills with the ability to influence architecture and drive performance culture across engineering
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
Responsibilities
- Design and implement end-to-end performance testing strategies for distributed cloud-native systems, ensuring scalability, reliability, and responsiveness
- Build and maintain robust, reusable performance test frameworks and pipelines using tools like JMeter, Locust, or custom Python-based solutions
- Develop and execute load, stress, soak, and failover tests to simulate real-world usage patterns, edge cases, and peak load scenarios
- Identify system bottlenecks, resource contention, and inefficiencies across services, infrastructure, and code; work cross-functionally to drive resolution
- Collaborate with Product and Customer Success teams to understand key customer workflows and usage patterns, translating them into performance test scenarios
- Integrate performance tests into CI/CD pipelines and staging environments to enable continuous performance validation and pre-release gatekeeping
- Define and track key performance metrics (e.g., latency, throughput, system resource usage) and build dashboards using Prometheus, Grafana, or other observability platforms
- Perform deep-dive analysis of performance test results, system telemetry, and application profiling data (e.g., Flamegraphs, heap dumps)
- Advocate for performance-first design principles across engineering teams; influence architectural decisions to improve system efficiency and testability
- Contribute to chaos engineering and fault injection efforts to validate system resilience under adverse conditions
- Lead incident retrospectives related to performance degradation and provide guidance for proactive tuning and improvements
- Thrive in a fast-paced environment, owning performance initiatives from inception through implementation and continuous iteration.
Preferred Qualifications
- Experience with Java JVM tuning, Golang profiling is a plus
- Experience in chaos engineering, fault injection, and resilience testing
- Familiarity with capacity planning, system tuning, and infrastructure sizing for cloud-native applications
- Strong knowledge of database performance, caching strategies, and distributed systems principles
- Prior experience leading performance initiatives in multi-cloud or hybrid-cloud environments