About the company:
It is an AI-native digital engineering company focused on building scalable enterprise AI, cloud, and digital transformation solutions using its proprietary GalentAI platform.
Experience: 6 12 Years
Work Mode: Remote
Notice Period: Immediate to 30 Days
Job Summary:
We are looking for a hands-on Senior DevOps Engineer to build and manage scalable multi-cloud AI SaaS applications and platforms. The role involves designing and managing cloud infrastructure across AWS, Azure, and GCP, supporting AI/LLM-based applications, Kubernetes operations, Infrastructure-as-Code, CI/CD automation, observability, security, and platform reliability.
Key Responsibilities
- Design, deploy, and manage infrastructure across AWS, Azure, and GCP
- Manage Kubernetes environments including EKS, AKS, and GKE
- Build and maintain Infrastructure-as-Code using Terraform, Pulumi, or CDKTF
- Implement CI/CD and GitOps pipelines using GitHub Actions, Jenkins, ArgoCD, or GitLab CI
- Handle monitoring, observability, logging, and incident management
- Ensure cloud security, IAM, secrets management, compliance, and governance
- Support microservices architecture and platform engineering initiatives
- Optimize platform scalability, reliability, and cloud cost management
- Collaborate with architects, developers, and security teams for platform modernization
Required Experience
- 610 years of experience in DevOps / SRE / Platform Engineering
- Minimum 3 years of experience managing production cloud-native and containerized workloads
- Strong hands-on experience across AWS, Azure, and GCP
- Experience running multi-tenant SaaS platforms and AI/ML production workloads
- Hands-on experience in microservices architecture, platformization, and scalable product environments
- Strong experience with Docker, Kubernetes, Helm, and container orchestration
- Expertise in CI/CD, GitOps, Infrastructure Automation, and Cloud Operations
- Strong scripting/programming skills in Python or TypeScript
- Experience with monitoring tools such as Prometheus, Grafana, ELK, Datadog, or OpenTelemetry
- Good understanding of cloud networking, IAM, security, and secrets management
- Experience supporting SOC2 / ISO27001 compliance environments is preferred
Preferred Skills
- Multi-cloud certifications (AWS/Azure/GCP)
- Experience with AI/LLM platforms, Bedrock, Vertex AI, or Azure OpenAI
- Exposure to MongoDB, Redis, Neo4j, Kafka, Pinecone, or similar technologies
- Knowledge of FinOps, cost optimization, and cloud governance practices