Principal Performance Engineer – Cortex Cloud
Company | Palo Alto Networks |
---|---|
Location | Santa Clara, CA, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Bachelor’s, Master’s |
Experience Level | Expert or higher |
Requirements
- 10+ years of experience in software engineering or performance engineering, with a strong focus on testing and optimizing distributed cloud-native systems.
- Proven track record in building and executing performance testing strategies in complex, large-scale environments.
- Strong programming and scripting skills in Python (preferred), along with experience using performance tools like JMeter, Locust, or similar.
- Expertise in performance profiling, diagnostics, and tuning across microservices architectures.
- Deep understanding of cloud platforms (AWS, GCP, Azure), Kubernetes orchestration, and cloud-native service architectures.
- Hands-on experience with observability and monitoring tools such as Prometheus, Grafana, OpenTelemetry etc.
- Strong knowledge of CI/CD systems and infrastructure automation, including Gitlab, Jenkins or similar.
- Experience with Java JVM tuning, Golang profiling is a plus.
- Excellent analytical, debugging, and troubleshooting skills at the system and application levels.
- Exceptional communication skills with the ability to influence architecture and drive performance culture across engineering.
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
- Experience in chaos engineering, fault injection, and resilience testing.
- Familiarity with capacity planning, system tuning, and infrastructure sizing for cloud-native applications.
- Strong knowledge of database performance, caching strategies, and distributed systems principles.
- Prior experience leading performance initiatives in multi-cloud or hybrid-cloud environments.
Responsibilities
- Design and implement end-to-end performance testing strategies for distributed cloud-native systems, ensuring scalability, reliability, and responsiveness.
- Build and maintain robust, reusable performance test frameworks and pipelines using tools like JMeter, Locust, or custom Python-based solutions.
- Develop and execute load, stress, soak, and failover tests to simulate real-world usage patterns, edge cases, and peak load scenarios.
- Identify system bottlenecks, resource contention, and inefficiencies across services, infrastructure, and code; work cross-functionally to drive resolution.
- Collaborate with Product and Customer Success teams to understand key customer workflows and usage patterns, translating them into performance test scenarios.
- Integrate performance tests into CI/CD pipelines and staging environments to enable continuous performance validation and pre-release gatekeeping.
- Define and track key performance metrics (e.g., latency, throughput, system resource usage) and build dashboards using Prometheus, Grafana, or other observability platforms.
- Perform deep-dive analysis of performance test results, system telemetry, and application profiling data (e.g., Flamegraphs, heap dumps).
- Advocate for performance-first design principles across engineering teams; influence architectural decisions to improve system efficiency and testability.
- Contribute to chaos engineering and fault injection efforts to validate system resilience under adverse conditions.
- Lead incident retrospectives related to performance degradation and provide guidance for proactive tuning and improvements.
- Thrive in a fast-paced environment, owning performance initiatives from inception through implementation and continuous iteration.
Preferred Qualifications
- Experience with Java JVM tuning, Golang profiling is a plus.