Cloud Specialist - CloudOps
Experience: 5.1 to 8 years
Any Graduate
Role Overview
We are seeking a highly skilled and adaptable Senior Cloud Operations Engineer with 5+ years of experience to join our dynamic CloudOps team. In this role, you will be responsible for ensuring the reliability, scalability, and security of our cloud infrastructure.
We are looking for a specialist in at least one major cloud platform (AWS, GCP or Azure) who is enthusiastic about cross-skilling into others. As a core member of our 24x7 operations team, you will champion a "code-first" automation mindset, leveraging automation to eliminate manual intervention and drive operational excellence.
Key Responsibilities
- Cloud Operations & Management: Monitor, maintain, and optimize core cloud infrastructure across AWS, GCP or Azure, ensuring high availability and performance.
- Infrastructure as Code (IaC): Design, build and maintain production-grade infrastructure utilizing Terraform as the primary provisioning tool.
- Cross-Cloud Collaboration: Actively learn and cross-skill into alternate cloud platforms (e.g., transitioning or expanding from AWS to GCP/Azure) to support our multi-cloud strategy.
- 24x7 Operations & On-Call Support: Participate in a rolling 24x7 on-call shift rotation to provide critical incident response, troubleshooting and production support.
- Automation-First Culture: Constantly identify manual operational bottlenecks and eliminate them through scripting (Python, Bash) and CI/CD pipelines.
- Observability & Monitoring: Maintain and enhance enterprise monitoring, logging, and alerting systems (e.g., Datadog, Prometheus, ELK Stack) to preemptively catch infrastructure degradation.
Required Skills & Qualifications
- Experience: 5-8 years of professional experience in Cloud Operations, DevOps or Site Reliability Engineering (SRE).
- Cloud Expertise: Deep, hands-on architectural and operational knowledge of at least one of the following:
- AWS (EC2, S3, RDS, EKS, IAM, CloudWatch)
- Google Cloud / GCP (GCE, GCS, GKE, Cloud Logging, IAM)
- Microsoft Azure (AKS, Azure VMs, Azure Storage, Entra ID, Monitor)
- Infrastructure as Code: Advanced, production-proven experience with Terraform (modules, state management, workspaces) is a strict requirement.
- Operating Systems & Scripting: Strong proficiency in Linux/Unix administration and scripting languages (Python, Go, or Bash) for infrastructure automation.
- Flexibility: Explicit willingness and aptitude to learn and support alternative cloud platforms as business needs evolve.
- Availability: Absolute readiness to work in a 24x7 operational environment, including night shifts, weekends and on-call rotations.
Preferred & Added Advantages
- AI & AIOps Exposure: Experience or a strong interest in incorporating AI/ML tools into operations (e.g., predictive analytics, automated anomaly detection, AI-driven incident correlation, or managing infrastructure supporting AI workloads).
- Containerization: Hands-on experience with Docker and Kubernetes (EKS/GKE/AKS) orchestration.
- Certifications: Valid professional-level certifications in AWS, GCP, Azure, or HashiCorp Terraform are highly desirable.