Senior Machine Learning Ops Engineer

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
At least 5 years of experience in software engineering, MLOps, or ML infrastructure roles.
Strong proficiency in Python and relevant ML engineering tooling for dependency management, packaging, testing, and deployment (e.g., Poetry, Pytest, Pylint).
Hands-on experience with ML workflow orchestration tools (e.g., MLflow, Kubeflow, Airflow, SageMaker, Weights & Biases).
Expertise in designing and managing CI/CD pipelines for ML applications using GitHub Actions, Jenkins, or similar tools.
Experience with cloud-based ML infrastructure (e.g., AWS, GCP, Azure) and containerized deployments using Docker and Kubernetes.
A strong sense of ownership, quality, and engineering best practices in ML production environments.

Design, implement, and optimize scalable MLOps infrastructure to support data ingestion, model training, evaluation, and inference at scale.
Develop and maintain CI/CD pipelines for automating ML workflows, including training, validation, and deployment of ML models.
Build robust containerization and orchestration strategies for ML artifacts and services using Docker and Kubernetes.
Automate monitoring, logging, and alerting for ML models in production to ensure reliability and performance.
Establish and enforce best practices for ML model versioning, governance, and reproducibility using tools such as MLflow or Kubeflow.
Collaborate with data scientists, ML engineers, and DevOps teams to streamline the transition of ML models from research to production.
Contribute to regulatory documentation and compliance processes (FDA, IDE, etc.) to support ML model deployment in regulated environments.

Familiarity with deep learning frameworks like TensorFlow and PyTorch, particularly in the context of deployment and optimization.
Experience with medical imaging applications and regulatory compliance requirements.
Knowledge of microservices and API development frameworks such as FastAPI, REST, and gRPC.
Understanding of distributed computing frameworks such as Ray or Spark for ML scaling.