Lead Cloud Engineer – Kafka
Company | S&P Global |
---|---|
Location | Calgary, AB, Canada |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Bachelor’s |
Experience Level | Senior, Expert or higher |
Requirements
- 8+ years of relevant experience combined with a bachelor’s degree. Master’s degree is preferred.
- 3-5 years of experience working with Messaging Platforms in a production environment.
- Strong knowledge of Kafka architecture, including brokers, topics, partitions, and replicas.
- Proficiency in configuring, deploying, and managing Kafka clusters in cloud and on-premises environments.
- Experience with AWS services such as EC2, S3, RDS, Elastic Beanstalk, Elastic Load Balancer, Route 53, VPC, IAM, CloudFront, CloudWatch etc.
- Experience with automation tools such as CloudFormation, Ansible and Terraform
- Proficiency in Java, Scala, or Python for Kafka-related development tasks will be a plus.
- Familiarity with DevOps practices, including CI/CD pipelines, monitoring, and logging.
- Strong problem-solving skills and the ability to troubleshoot complex issues in a distributed environment.
- Excellent communication and collaboration skills to work effectively with cross-functional teams and stakeholders.
Responsibilities
- Ensure that the messaging platform functions successfully, safely, and efficiently.
- Increase productivity, decrease downtime, and support the mission of the Market Intelligence division.
- Fix technical issues, maintain infrastructure, provide on-call support, and make enhancements as needed by business clients.
- Install, configure, and maintain Kafka clusters and associated infrastructure across multiple regions and accounts.
- Monitor & troubleshoot technical issues and take corrective action as required.
- Develop and implement backup and disaster recovery plans for Kafka clusters.
- Configure and manage Kafka topics, partitions, and consumer groups.
- Manage access controls, security, and authentication protocols for Kafka clusters.
- Design, develop, and deploy scalable, reliable, and secure AWS solutions.
- Optimize AWS platform performance, scalability, and cost-efficiency.
- Develop and maintain blueprints and design documents for platform architecture.
- Automate cloud operations using Infrastructure as Code (IaC) tools like Terraform, Ansible, or AWS CloudFormation.
- Use scripting languages (Python, Bash, PowerShell) to optimize and automate workflows.
- Implement security best practices to safeguard cloud resources and ensure compliance with organizational and regulatory standards.
- Develop risk assessments, disaster recovery plans, and support execution during disaster recovery exercises.
- Partner with product and platform owners to deliver innovative solutions to empower business applications with scalable infrastructure.
- Work with cross-functional teams to evaluate solutions and recommend strategies for modernizing and consolidating legacy platforms.
- Provide operations support for event streaming platforms, ensuring Kafka cluster health and stability. Handle incident management, request fulfilment, and escalation.
- Analyze logs to debug issues and performance bottlenecks. Collaborate with client teams to troubleshoot platform issues, consumer lag, and data replication inconsistencies.
- Perform routine maintenance tasks, including log retention management and partition rebalancing.
Preferred Qualifications
- Working experience in an agile environment and methodologies.
- Work experience with Observability tools such as Grafana, Prometheus and Splunk.
- Work experience with any of the Linux products. Good knowledge on Shell or Python scripting.
- Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes)