Intermediate or Senior Mlops Engineer
Company | Workday |
---|---|
Location | Beaverton, OR, USA, Boulder, CO, USA, Atlanta, GA, USA |
Salary | $128800 – $193200 |
Type | Full-Time |
Degrees | Bachelor’s, Master’s |
Experience Level | Mid Level, Senior |
Requirements
- 5 or more years of proven industry experience.
- Bachelor’s and/or Master’s degree in Computer Science or Computer Engineering.
- Design, implement, and maintain robust MLOps services for deploying, monitoring, and scaling machine learning development and data engineering primarily with Kubeflow.
- Stay abreast of industry trends and emerging technologies, providing recommendations for continuous improvement of our DevOps and machine learning practices.
- Troubleshoot and resolve performance bottlenecks, system outages, and other operational issues in collaboration with the ML engineering teams.
- Optimize public cloud-based infrastructure (AWS, GCP) to support the computational requirements of machine learning workloads.
- Implement and manage CI/CD workflows to automate testing, integration, and delivery of machine learning components.
- Ensure the security and compliance of machine learning platforms, implementing best practices for encryption, data protection and access controls.
- Professional experience in building web applications and microservices and API design.
- Experience in supporting large Kubernetes networks in production.
- 5 or more years of cloud programming experience preferably in Python or Go.
- Experience with running and maintaining Databricks, Sagemaker, & Apache Spark as a service.
Responsibilities
- Work with multi-functional teams to deliver scalable, secure and reliable solutions.
- Building MLOps platform primarily using Kubeflow, and other ML ecosystem tools and services for a unified ML Development experience.
- Successfully communicate with data scientists, ML engineers, PMs and architects in requirements elaboration and drive technical solutions.
- Own and develop cloud based services from end to end including infrastructure as code.
- Design and build software solutions for efficient organization, storage and retrieval of data to enable substantial scale.
- Understanding cloud computing and security to build robust cloud infrastructure and solutions for ML teams.
- Build systems and dashboards to monitor service & ML health.
- Lead in architecture reviews, code reviews and technology evaluation.
- Research, evaluate, prototype and drive adoption of new ML tools with reliability and scale in mind.
Preferred Qualifications
- Implementation and operation of distributed systems and software development including the conception, specifying, designing, programming, documenting, testing, and bug fixing involved in creating and maintaining applications, frameworks, or other software components.
- Experience in managing relevant tools like Databricks and Sagemaker to perform efficient computation and management of large scale data lakes.
- Experience of data and/or ML systems with ability to think across layers of the stack.
- Develop and maintain monitoring and alerting systems for proactively identifying and addressing issues within the machine learning infrastructure.
- Experience in leading or mentoring other team members and proven team collaboration experience, i.e. understanding group dynamics, effective communication strategies, conflict resolution techniques, and the ability to foster a positive and inclusive team environment.