Posted in

Senior Software Development Engineer-Mlops

Senior Software Development Engineer-Mlops

CompanyWorkday
LocationToronto, ON, Canada, Beaverton, OR, USA, Atlanta, GA, USA, Vancouver, BC, Canada
Salary$132800 – $199200
TypeFull-Time
DegreesBachelor’s, Master’s
Experience LevelSenior

Requirements

  • Solid experience as a Software Development Engineer in ML domain- 5+ years’ experience with a Master’s or higher or 6+ years with a Bachelor’s in Computer Science or Computer Engineering or equivalent
  • 4 year’s experience designing, implementing, and maintaining robust MLOps services for deploying, monitoring, and scaling machine learning development primarily using Kubeflow or similar platforms
  • Professional experience in building web applications and microservices and API design
  • Solid understanding on how to implement and manage CI/CD workflows to automate testing, integration, and delivery of machine learning components
  • Experience in supporting large Kubernetes networks in production
  • 6 or more years of cloud programming experience preferably in Python or Go
  • Experience with running and maintaining ML platforms such as: Databricks, Sagemaker, and or VertexAI

Responsibilities

  • Work with multi-functional teams to deliver scalable, secure and reliable solutions
  • Building MLOps platform primarily using Kubeflow, and other ML ecosystem framework and services for building a unified ML Development experience
  • Effectively engage with data scientists, ML engineers, PMs and architects in requirements elaboration and drive technical solutions
  • Own and develop cloud-based services from end to end including infrastructure as code
  • Design and build software solutions for efficient organization, storage and retrieval of data to enable substantial scale
  • Understanding cloud computing and security to build robust cloud infrastructure and solutions for ML teams
  • Build systems and dashboards to monitor service & ML health
  • Lead in architecture reviews, code reviews and technology evaluation
  • Research, evaluate, prototype and drive adoption of new ML tools with reliability and scale in mind

Preferred Qualifications

  • Implementation and operation of distributed systems
  • Stay abreast of industry trends and emerging technologies, providing recommendations for continuous improvement of our DevOps and machine learning practices
  • Troubleshoot and resolve performance bottlenecks, system outages, and other operational issues in collaboration with the ML engineering teams
  • Ensure the security and compliance of machine learning platforms, implementing best practices for encryption, data protection and access controls
  • Optimize public cloud-based infrastructure (AWS, GCP) to support the computational requirements of machine learning workloads
  • Experience in managing relevant tools like Databricks and Sagemaker to perform efficient computation and management of large-scale data lakes
  • Experience of data and/or ML systems with ability to think across layers of the stack
  • Develop and maintain monitoring and alerting systems for proactively identifying and addressing issues within the machine learning infrastructure
  • Experience in leading or mentoring other team members