Tech Lead Platform Engineer – AI & ML Ops
Company | Enable |
---|---|
Location | Toronto, ON, Canada |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Bachelor’s |
Experience Level | Senior, Expert or higher |
Requirements
- 8+ years of experience in platform, DevOps, or cloud engineering, with at least 3+ years in SaaS environments.
- Bachelor’s degree in Computer Science, Engineering, or a related field.
- Expertise in architecting, deploying, and managing cloud-native applications on AWS, GCP, or Azure.
- Strong experience with Kubernetes, serverless computing, and container orchestration.
- Proficiency in modern Infrastructure-as-Code (Terraform, Pulumi) and CI/CD tools (GitHub Actions, ArgoCD).
- Experience in distributed systems, service mesh technologies (Istio, Linkerd), and event-driven architectures.
- Hands-on experience with databases (SQL, NoSQL, vector DBs), ensuring high availability and performance.
- Deep understanding of SaaS security principles, identity management, and compliance frameworks (SOC 2, ISO 27001).
- Strong programming and automation skills in Python, SQL, and Bash.
- Strong experience with microservices architecture and API development.
- Familiarity with MLOps frameworks and ML model operationalization.
Responsibilities
- Design, build, and optimize a cloud-native, multi-tenant SaaS platform that scales with Enable’s rapid growth.
- Develop and maintain core infrastructure components, including compute, networking, observability, and CI/CD pipelines.
- Implement best practices for cloud cost optimization, security, and system reliability.
- Enhance API gateways, identity management, and service orchestration for seamless integration across services.
- Architect and manage scalable AI/ML pipelines for model training, deployment, and monitoring.
- Develop and maintain MLOps workflows using tools like Kubeflow and MLflow.
- Optimize ML model inference for real-time and batch processing in a production SaaS environment.
- Collaborate with data scientists and ML engineers to streamline model lifecycle management.
- Implement and refine Infrastructure-as-Code (IaC) practices using Terraform or Pulumi.
- Build self-healing, automated monitoring solutions for system health, application performance, and security.
- Improve CI/CD processes to support high-velocity engineering teams with minimal operational overhead.
- Establish robust logging, tracing, and metrics collection for visibility into SaaS application performance.
- Work closely with engineering teams to ensure platform capabilities support product innovation and reliability.
- Partner with security teams to ensure compliance with SaaS security and compliance standards and best practices.
- Implement monitoring, logging, and alerting solutions to track model and system performance, ensuring compliance with best practices.
- Define and document platform engineering best practices to elevate team-wide capabilities.
- Mentor junior and mid-level engineers, fostering a culture of technical excellence and continuous learning.
Preferred Qualifications
- Experience building and maintaining large-scale data processing pipelines.
- Expertise in observability tools such as Prometheus, Grafana, OpenTelemetry, or ELK stack.
- Familiarity with real-time data processing and messaging platforms (Kafka, Pub/Sub, Kinesis).
- Background in high-availability, globally distributed architectures.
- Work experience in cutting-edge AI and MLOps challenges at the intersection of ML, engineering, and cloud infrastructure.
- Certifications in cloud computing (e.g., AWS Certified DevOps Engineer, Azure DevOps Engineer).
- Contributions to open-source DevOps, platform, or MLOps tooling.