Skip to content

Technical Program Manager III – Incidents and Availability Analysis – Data Centers
Company | Google |
---|
Location | Sunnyvale, CA, USA |
---|
Salary | $156000 – $229000 |
---|
Type | Full-Time |
---|
Degrees | Bachelor’s |
---|
Experience Level | Senior |
---|
Requirements
- Bachelor’s degree or equivalent practical experience.
- 5 years of experience in program or technical project management.
- Experience in data center infrastructure or other mission critical operations.
- Experience in Root Cause Analysis (RCA), incident investigation, or statistical analysis.
Responsibilities
- Maintain and log incident and root cause databases for Google’s data center infrastructure.
- Develop analytical dashboards/reports, analysis, models, data pipeline management, advance analytical toolkits or decision-support systems to improve data center availability.
- Perform availability analytics and decision support models, communicate results, methods, and findings to leadership and influence leads to take action on these findings, and evaluate trends in the incident and root cause databases for common themes among recent incidents for further action or escalation.
- Apply predictive (e.g., regression, clustering, and classification) and statistical analysis techniques to reveal hidden patterns, detect anomalies, and gain statistical insights from the incident and root cause data.
- Create executive presentations of findings and subsequent recommended actions for Central Operations, DCOps, and leadership (e.g., program reviews, Quarterly Business Reviews, etc.).
Preferred Qualifications
- Master’s degree in Mechanical, Electrical, Control, or Systems Engineering.
- 5 years of experience managing cross-functional or cross-team projects.
- Experience in SQL for querying datasets, and with Python or R for statistical analysis and data manipulation.
- Experience with data center operations, incident management processes, and operational metrics (e.g., availability SLAs, MTTR, MTBF).
- Experience with analytical techniques such as predictive modeling (e.g., linear and logistic regression, decision trees, random forests), detection methods (e.g., z-score analysis, density-based clustering) and inferential statistics (e.g., hypothesis testing, confidence intervals, regression analysis).
- Ability to travel up to 20% as needed.
Benefits
No information provided on Benefits.