Lead MLOps Engineer (4–5 Years Experience)
Role: Lead MLOps Engineer
Experience: 4-5 Years
Employment Type: Full-Time
About the Role
We are looking for a highly hands-on Lead MLOps Engineer who can both build production-grade ML systems and mentor junior engineers.
This is not a pure architecture role.You will actively contribute to delivery while defining engineering standards, building reusable frameworks, and grooming a team of junior AI engineers.
The ideal candidate is someone who has shipped ML systems to production, understands cloud deeply, and is comfortable operating in fast-paced delivery environments.
What You Will Do
Hands-on Delivery (Primary Responsibility)
Design and deploy end-to-end ML pipelines in production
Build scalable training and inference workflows
Implement CI/CD pipelines for ML systems
Deploy models using containerized environments (Docker, Kubernetes)
Work across cloud platforms (GCP / Azure preferred)
Troubleshoot production issues, latency, drift, and infra bottlenecks
Ensure cost optimization and performance tuning
You will be expected to write production-quality Python code daily.
Architecture & Standards
Define MLOps best practices across projects
Set up experiment tracking and model registry (MLflow or equivalent)
Define repository structure and branching strategies
Establish monitoring and alerting mechanisms
Implement model performance and data drift monitoring
Create reusable deployment templates
Team Mentorship & Grooming
Mentor 4–5 junior AI engineers
Conduct code reviews and enforce coding standards
Help juniors move from prototype-level code to production-grade systems
Define internal learning roadmap for MLOps maturity
Required Skills & Qualifications
Core Engineering
4-5 years of experience in ML Engineering / MLOps
Strong hands-on expertise in Python
Deep understanding of ML lifecycle (training evaluation deployment monitoring)
Experience with PyTorch / TensorFlow or similar frameworks
Strong debugging and performance optimization skills
Cloud & Infrastructure
Hands-on experience with GCP and/or Azure
GCP: Vertex AI, GKE, Cloud Run, Cloud Storage
Azure: Azure ML, AKS, Azure DevOps
Experience working across multiple cloud environments
Containerization with Docker
Kubernetes-based deployment
Infrastructure-as-Code exposure (Terraform preferred)
DevOps & Automation
CI/CD pipeline implementation (GitHub Actions / Azure DevOps / Cloud Build)
Experience with MLflow (experiment tracking & model registry)
Git-based workflows
Monitoring tools (Prometheus, Grafana, cloud-native monitoring)
What We Are Specifically Looking For
Someone who has taken ML systems from notebook to production independently
Comfortable handling ambiguous problem statements
Can debug failing pipelines quickly
Has worked in real production environments (not just PoCs)
Able to balance delivery pressure and long-term engineering standards
What This Role Is NOT
Not a research-only ML role
Not a slide-driven architecture role
Not a pure DevOps role without ML understanding
This is a delivery-first leadership role.
Preferred / Good to Have
Experience with LLMOps or GenAI pipelines
Experience building APIs (FastAPI preferred)
Understanding of security best practices in ML systems