The CloudOps Engineer is responsible for the day-to-day operations, reliability, performance, and cost optimization of cloud environments across infrastructure, applications, and data platforms. This role ensures secure, scalable, and highly available cloud operations by implementing automation, monitoring, and governance aligned with organizational standards.
-
Manage and monitor cloud infrastructure, applications, and platform services to ensure high availability and performance
-
Implement incident management, root cause analysis (RCA), and problem resolution
-
Ensure uptime, reliability, and performance using SRE practices (SLI/SLO/SLA)
-
Handle on-call support and production issues
-
Operate and maintain cloud environments (dev/test/stage/prod)
-
Manage subscriptions/accounts, RBAC, IAM roles, and access controls
-
Maintain network configurations, VMs, containers, storage, and platform services
-
Support deployment pipelines and environment provisioning
-
Develop and maintain Infrastructure-as-Code (Terraform/Bicep/CloudFormation)
-
Automate deployment, scaling, patching, and configuration management
-
Support CI/CD pipelines and ensure smooth release management
-
Implement auto-scaling and self-healing mechanisms
-
Implement and manage monitoring tools, alerts, dashboards, and logging frameworks
-
Ensure proactive detection of issues using metrics, logs, and traces
-
Optimize system performance and reduce downtime
-
Enforce security best practices (IAM, encryption, network security)
-
Ensure compliance with organizational policies and regulatory requirements
-
Implement backup, disaster recovery (DR), and business continuity (BCP)
-
Monitor cost usage, tagging, and budget controls
-
Support cloud migration activities (rehost, replatform) from an operations perspective
-
Validate deployment readiness, rollback strategies, and runbooks
-
Ensure smooth transition to production and post-go-live support
-
Work closely with DevOps, Developers, Architects, and Security teams
-
Improve operational efficiency through automation and optimization
-
Document runbooks, SOPs, and operational procedures
-
5–10 years in Cloud Operations / DevOps / SRE roles
-
Hands-on experience with Azure / AWS / GCP cloud platforms
-
Strong knowledge of:
-
Infrastructure-as-Code (Terraform/Bicep/CloudFormation)
-
CI/CD tools (Azure DevOps, Jenkins, GitHub Actions)
-
Monitoring tools (CloudWatch, Azure Monitor, Prometheus, Grafana)
-
Experience with:
-
Containers (Docker, Kubernetes)
-
Networking, IAM, and security best practices
-
Familiarity with incident management, DR/BCP, and cost optimization