Cloud Operations Engineer – Monitoring Lead
Company | Extreme Networks |
---|---|
Location | North York, Toronto, ON, Canada |
Salary | $120000 – $130000 |
Type | Full-Time |
Degrees | Bachelor’s |
Experience Level | Senior, Expert or higher |
Requirements
- BS level technical degree required; Computer Science or Engineering background preferred.
- 8+ years of progressive experience in Cloud Operations, DevOps, or Site Reliability Engineering roles, with a strong focus on monitoring.
- Deep expertise with at least one major public cloud platform (AWS, Azure, or Google Cloud Platform).
- Proven experience as a technical lead or senior contributor in a monitoring-focused role.
- Working knowledge of container-based architecture and deployment (Docker, Kubernetes).
- Extensive experience with various monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk, ELK Stack, vendor-specific monitoring solutions).
- Excellent problem-solving, analytical, and troubleshooting skills.
- Working knowledge of Elasticsearch, PostgreSQL, Redis, Ignite, Kafka and RabbitMQ.
- Comfortable working within a distributed team located in multiple time zones.
Responsibilities
- Lead the design, implementation, and continuous improvement of our end-to-end monitoring and alerting framework for cloud infrastructure (AWS, Azure, GCP), applications, and services.
- Define key performance indicators (KPIs), service level indicators (SLIs), and service level objectives (SLOs) for critical systems.
- Evaluate, select, and integrate monitoring tools (e.g., Prometheus, Grafana, Datadog, Splunk, CloudWatch, Azure Monitor, GCP Operations Suite) to meet evolving needs.
- Develop and implement automation scripts and tools (e.g., Python, Bash, PowerShell) to streamline monitoring deployment, configuration, and incident remediation.
- Build and maintain dashboards, alerts, and reports that provide actionable insights into system performance, health, and availability.
- Analyze monitoring data to identify performance bottlenecks, resource inefficiencies, and potential cost optimization opportunities.
- Collaborate with engineering teams to implement performance improvements and cost-saving measures.
- Create and maintain comprehensive documentation for monitoring systems, procedures, and best practices.
- Proactively identify areas for improvement in our cloud operations and monitoring capabilities.
- Provide 24* 7 support for Cloud services
- Participate in cloud security and compliance implementation.
Preferred Qualifications
-
No preferred qualifications provided.