Job Description Position: AI DevOps / AI Platform Engineering Lead
Location Kolkata (On-site)
Experience Required 8–15 Years
Employment Type Full-Time
About the Role: We are looking for an experienced AI DevOps / AI Platform Engineering Lead to drive our AI infrastructure, platform engineering, and automation initiatives. This role will be instrumental in enabling the organization’s AI transformation journey by building scalable, reliable, and production-ready platforms for AI and machine learning workloads. The ideal candidate will bring strong expertise in cloud infrastructure, Kubernetes, DevOps automation, platform engineering, and AI/MLOps ecosystems, along with the ability to lead teams and drive engineering excellence.
Key Responsibilities:
AI Platform Engineering & Infrastructure - Design, build, and manage scalable AI-ready cloud infrastructure. Enable reliable deployment and operationalization of AI/ML workloads in production environments. Support AI model deployment, inference services, and platform automation initiatives. Drive adoption of MLOps and LLMOps best practices across engineering teams.
DevOps & Cloud Engineering - Architect and manage cloud-native platforms on AWS. Design and implement Infrastructure as Code (IaC) using Terraform and related tools. Build and optimize CI/CD and GitOps workflows for application and AI deployments. Improve platform scalability, reliability, security, and operational efficiency.
Kubernetes & Platform Operations - Manage and optimize production-scale Kubernetes environments. Design resilient, highly available, and observable platform architectures. Implement container orchestration, workload scheduling, and automation frameworks. Ensure platform readiness for AI and data-intensive workloads.
Observability & Reliability - Implement monitoring, logging, tracing, and alerting solutions. Drive SRE practices, incident management, and production troubleshooting. Establish platform reliability metrics, operational standards, and governance frameworks.
Leadership & Collaboration - Lead and mentor DevOps and platform engineering teams. Drive DevOps transformation and automation initiatives across the organization. Collaborate closely with Engineering, AI, Data Science, and Product teams. Provide technical guidance on infrastructure, scalability, and deployment strategies.
Mandatory Skills & Qualifications:
- 8–15 years of experience in DevOps, Cloud Infrastructure, Platform Engineering, or related domains.
- Strong hands-on experience with AWS Cloud Services.
- Expertise in Kubernetes and containerized application deployments.
- Strong proficiency in Terraform and Infrastructure as Code practices.
- Experience implementing CI/CD pipelines and GitOps workflows.
- Exposure to AI/LLMOps, MLOps, or AI infrastructure environments.
- Strong understanding of platform engineering principles and cloud-native architectures.
- Experience with monitoring, observability, and production operations.
- Proven ability to lead teams and drive engineering initiatives.
- Strong troubleshooting and problem-solving skills in production environments.
Preferred Skills:
- AI Infrastructure and LLM deployment experience.
- MLOps / LLMOps platform implementation.
- Experience with SageMaker, Kubeflow, MLflow, Vertex AI, or similar platforms.
- Exposure to LangChain, LangGraph, RAG, Vector Databases, or Agentic AI ecosystems.
- Experience running AI workloads on Kubernetes.
- GPU infrastructure and inference workload understanding.
- Experience with Prometheus, Grafana, OpenTelemetry, ELK, Datadog, or similar observability tools.
- Familiarity with DevSecOps and cloud security best practices.
Key Competencies:
- Platform Engineering Mindset
- AI Infrastructure Enablement
- Cloud Architecture & Automation
- Leadership & Team Management
- Production Reliability & Scalability
- Strategic Problem Solving
- Stakeholder Management
- Continuous Improvement & Innovation
Success Metrics / KPIs:
- Successful operationalization of AI workloads in production.
- Platform reliability, uptime, and performance improvements.
- Deployment automation and CI/CD maturity.
- Reduction in deployment and infrastructure-related incidents.
- Infrastructure scalability and cost optimization.
- Adoption of MLOps/LLMOps best practices.
- Team productivity and engineering efficiency improvements
Pay: ₹800,000.00 - ₹2,200,000.00 per year
Ability to commute/relocate:
- Kolkata, West Bengal: Reliably commute or planning to relocate before starting work (Required)
Experience:
- DevOps: 5 years (Required)
License/Certification:
- AWS Certification (Preferred)
Work Location: In person