Posted in

Tech Lead Platform Engineer – AI & ML Ops

Tech Lead Platform Engineer – AI & ML Ops

CompanyEnable
LocationToronto, ON, Canada
Salary$Not Provided – $Not Provided
TypeFull-Time
DegreesBachelor’s
Experience LevelSenior, Expert or higher

Requirements

  • 8+ years of experience in platform, DevOps, or cloud engineering, with at least 3+ years in SaaS environments.
  • Bachelor’s degree in Computer Science, Engineering, or a related field.
  • Expertise in architecting, deploying, and managing cloud-native applications on AWS, GCP, or Azure.
  • Strong experience with Kubernetes, serverless computing, and container orchestration.
  • Proficiency in modern Infrastructure-as-Code (Terraform, Pulumi) and CI/CD tools (GitHub Actions, ArgoCD).
  • Experience in distributed systems, service mesh technologies (Istio, Linkerd), and event-driven architectures.
  • Hands-on experience with databases (SQL, NoSQL, vector DBs), ensuring high availability and performance.
  • Deep understanding of SaaS security principles, identity management, and compliance frameworks (SOC 2, ISO 27001).
  • Strong programming and automation skills in Python, SQL, and Bash.
  • Strong experience with microservices architecture and API development.
  • Familiarity with MLOps frameworks and ML model operationalization.

Responsibilities

  • Design, build, and optimize a cloud-native, multi-tenant SaaS platform that scales with Enable’s rapid growth.
  • Develop and maintain core infrastructure components, including compute, networking, observability, and CI/CD pipelines.
  • Implement best practices for cloud cost optimization, security, and system reliability.
  • Enhance API gateways, identity management, and service orchestration for seamless integration across services.
  • Architect and manage scalable AI/ML pipelines for model training, deployment, and monitoring.
  • Develop and maintain MLOps workflows using tools like Kubeflow and MLflow.
  • Optimize ML model inference for real-time and batch processing in a production SaaS environment.
  • Collaborate with data scientists and ML engineers to streamline model lifecycle management.
  • Implement and refine Infrastructure-as-Code (IaC) practices using Terraform or Pulumi.
  • Build self-healing, automated monitoring solutions for system health, application performance, and security.
  • Improve CI/CD processes to support high-velocity engineering teams with minimal operational overhead.
  • Establish robust logging, tracing, and metrics collection for visibility into SaaS application performance.
  • Work closely with engineering teams to ensure platform capabilities support product innovation and reliability.
  • Partner with security teams to ensure compliance with SaaS security and compliance standards and best practices.
  • Implement monitoring, logging, and alerting solutions to track model and system performance, ensuring compliance with best practices.
  • Define and document platform engineering best practices to elevate team-wide capabilities.
  • Mentor junior and mid-level engineers, fostering a culture of technical excellence and continuous learning.

Preferred Qualifications

  • Experience building and maintaining large-scale data processing pipelines.
  • Expertise in observability tools such as Prometheus, Grafana, OpenTelemetry, or ELK stack.
  • Familiarity with real-time data processing and messaging platforms (Kafka, Pub/Sub, Kinesis).
  • Background in high-availability, globally distributed architectures.
  • Work experience in cutting-edge AI and MLOps challenges at the intersection of ML, engineering, and cloud infrastructure.
  • Certifications in cloud computing (e.g., AWS Certified DevOps Engineer, Azure DevOps Engineer).
  • Contributions to open-source DevOps, platform, or MLOps tooling.