Posted in

Lead Cloud Engineer – Kafka

Lead Cloud Engineer – Kafka

CompanyS&P Global
LocationCalgary, AB, Canada
Salary$Not Provided – $Not Provided
TypeFull-Time
DegreesBachelor’s
Experience LevelSenior, Expert or higher

Requirements

  • 8+ years of relevant experience combined with a bachelor’s degree. Master’s degree is preferred.
  • 3-5 years of experience working with Messaging Platforms in a production environment.
  • Strong knowledge of Kafka architecture, including brokers, topics, partitions, and replicas.
  • Proficiency in configuring, deploying, and managing Kafka clusters in cloud and on-premises environments.
  • Experience with AWS services such as EC2, S3, RDS, Elastic Beanstalk, Elastic Load Balancer, Route 53, VPC, IAM, CloudFront, CloudWatch etc.
  • Experience with automation tools such as CloudFormation, Ansible and Terraform
  • Proficiency in Java, Scala, or Python for Kafka-related development tasks will be a plus.
  • Familiarity with DevOps practices, including CI/CD pipelines, monitoring, and logging.
  • Strong problem-solving skills and the ability to troubleshoot complex issues in a distributed environment.
  • Excellent communication and collaboration skills to work effectively with cross-functional teams and stakeholders.

Responsibilities

  • Ensure that the messaging platform functions successfully, safely, and efficiently.
  • Increase productivity, decrease downtime, and support the mission of the Market Intelligence division.
  • Fix technical issues, maintain infrastructure, provide on-call support, and make enhancements as needed by business clients.
  • Install, configure, and maintain Kafka clusters and associated infrastructure across multiple regions and accounts.
  • Monitor & troubleshoot technical issues and take corrective action as required.
  • Develop and implement backup and disaster recovery plans for Kafka clusters.
  • Configure and manage Kafka topics, partitions, and consumer groups.
  • Manage access controls, security, and authentication protocols for Kafka clusters.
  • Design, develop, and deploy scalable, reliable, and secure AWS solutions.
  • Optimize AWS platform performance, scalability, and cost-efficiency.
  • Develop and maintain blueprints and design documents for platform architecture.
  • Automate cloud operations using Infrastructure as Code (IaC) tools like Terraform, Ansible, or AWS CloudFormation.
  • Use scripting languages (Python, Bash, PowerShell) to optimize and automate workflows.
  • Implement security best practices to safeguard cloud resources and ensure compliance with organizational and regulatory standards.
  • Develop risk assessments, disaster recovery plans, and support execution during disaster recovery exercises.
  • Partner with product and platform owners to deliver innovative solutions to empower business applications with scalable infrastructure.
  • Work with cross-functional teams to evaluate solutions and recommend strategies for modernizing and consolidating legacy platforms.
  • Provide operations support for event streaming platforms, ensuring Kafka cluster health and stability. Handle incident management, request fulfilment, and escalation.
  • Analyze logs to debug issues and performance bottlenecks. Collaborate with client teams to troubleshoot platform issues, consumer lag, and data replication inconsistencies.
  • Perform routine maintenance tasks, including log retention management and partition rebalancing.

Preferred Qualifications

  • Working experience in an agile environment and methodologies.
  • Work experience with Observability tools such as Grafana, Prometheus and Splunk.
  • Work experience with any of the Linux products. Good knowledge on Shell or Python scripting.
  • Familiarity with containerization and orchestration tools (e.g., Docker, Kubernetes)