CloudOps Engineer – L23 -5 Years
About Comprinno
Comprinno is a NASSCOM-incubated company headquartered in Bangalore, with offices in Pune, Coimbatore, and the United States. We specialize in cloud transformation, DevOps, infrastructure automation, and managed cloud operations, enabling organizations to build scalable, secure, and high-performing cloud environments on AWS.Our flagship SaaS platform, Tevico, is an intelligent cloud governance and observability platform that helps enterprises improve reliability, optimize costs, strengthen security, and automate operational workflows across AWS environments.As an AWS Advanced Consulting Partner, Comprinno helps customers migrate, modernize, secure, and manage cloud environments while adopting emerging technologies such as Data Analytics, AI, and Generative AI.About the RoleWe are seeking an experienced CloudOps Engineer – L2 to join our Managed Services team.This role is responsible for managing day-to-day AWS cloud operations 24/7 in rotational shifts, resolving escalated incidents, maintaining infrastructure reliability, implementing monitoring and automation solutions, and ensuring adherence to customer SLAs and operational standards.The ideal candidate will have strong AWS administration experience, excellent troubleshooting skills, and a proactive mindset toward automation, operational excellence, and customer success.This role serves as the escalation point for L1 engineers and plays a critical role in ensuring stable, secure, and optimized cloud environments for our managed services customers.Key ResponsibilitiesCloud Infrastructure Operations
- Monitor and manage AWS cloud environments to ensure availability, reliability, and performance.
- Provision, modify, and maintain AWS resources including:
- EC2
- RDS
- S3
- EKS
- IAM
- VPC
- Load Balancers
- Route 53
- IaC (Terraform, CloudFormation & CDK)
- Perform infrastructure administration activities through approved change management processes.
- Execute OS patching, system upgrades, and maintenance activities across Linux and Windows environments.
- Ensure infrastructure remains compliant with customer-defined operational baselines.
Incident & Problem Management
- Act as the primary escalation point for incidents raised by L1 engineers.
- Perform advanced troubleshooting across cloud infrastructure, networking, operating systems, and application dependencies.
- Manage incidents according to SLA commitments and severity classifications.
- Conduct root cause analysis (RCA) and implement corrective and preventive actions.
- Participate in post-incident reviews and continuous improvement initiatives.
- Coordinate with cloud engineering, security, and application teams during major incidents.
Monitoring & Observability
- Configure and manage monitoring solutions using:
- Amazon CloudWatch
- Prometheus
- Grafana
- Tevico
- OpenSearch
- Create dashboards, alerts, and operational visibility reports.
- Monitor infrastructure health, API availability, application performance, and service dependencies.
- Support observability initiatives leveraging Tevico and cloud-native monitoring platforms.
- Continuously improve monitoring coverage and reduce alert fatigue.
AWS Cost Optimization & FinOps
- Support AWS cost governance and optimization initiatives.
- Analyze Cost & Usage Reports (CUR), billing trends, and resource utilization patterns.
- Identify opportunities for:
- Rightsizing
- Storage Optimization
- Reserved Instances
- Savings Plans
- Resource Cleanup
- Collaborate with customers and internal teams to improve cloud efficiency and reduce waste.
Backup, Recovery & Resilience
- Implement and maintain backup strategies using:
- AWS Backup
- AMI Automation
- Snapshot Management
- Monitor backup success and recovery readiness.
- Support disaster recovery testing and business continuity exercises.
- Validate recovery procedures and document outcomes.
Security & Compliance Operations
- Implement and maintain cloud security controls and operational security best practices.
- Monitor findings from:
- Amazon GuardDuty
- AWS Security Hub
- IAM Access Analyzer
- Support vulnerability remediation and patch compliance activities.
- Perform access reviews and RBAC management.
- Assist with Cloud Security Posture Management (CSPM) reporting and compliance assessments.
Automation & Operational Excellence
- Automate repetitive operational activities using:
- AWS Systems Manager
- Lambda
- Python
- Bash
- PowerShell
- Develop and maintain operational scripts and automation workflows.
- Implement auto-remediation and self-healing mechanisms where applicable.
- Contribute to Infrastructure as Code (IaC) initiatives using Terraform and CloudFormation.
- Drive continuous improvement through automation and process optimization.
Documentation & Knowledge Management
- Maintain and improve:
- Runbooks
- Standard Operating Procedures (SOPs)
- Knowledge Base Articles
- Operational Documentation
- Contribute to customer reports including:
- SLA Reports
- Security Reports
- Cost Optimization Reports
- Operational Health Reports
- Provide technical guidance and mentoring to L1 engineers.
- Participate in internal knowledge-sharing and capability-building initiatives.
Required Qualifications & Skills
- Bachelor's degree in Computer Science, Information Technology, Engineering, or related disciplines.
- 3–5 years of experience in Cloud Operations, Infrastructure Management, Site Reliability Engineering (SRE), or Managed Services.
- Strong hands-on experience with AWS cloud services.
- Strong understanding of:
- Linux Administration
- Windows Administration
- Networking
- DNS
- TCP/IP
- VPNs
- Load Balancing
- Experience managing production cloud environments.
- Understanding of ITIL-based Incident, Problem, Change, and Service Management processes.
- Experience with monitoring and observability platforms.
- Hands-on experience with :
- Kubernetes core skills
- Amazon EKS
- Containers
- DevOps Practices
- Scripting experience using:
- Python
- Bash
- PowerShell
- Strong troubleshooting and root-cause analysis skills.
- Excellent communication and documentation abilities.
- Ability to work independently and manage multiple operational priorities.
Certification RequirementsMandatory (One of the Following)
- AWS Certified CloudOps/SysOps Administrator – Associate
- AWS Certified Solutions Architect – Associate
Preferred
- AWS Certified DevOps Engineer – Professional
- AWS Security Specialty
- AWS Advanced Networking Specialty
- ITIL Foundation Certification
- Kubernetes Certifications (CKA/CKAD)
What We're Looking For
- Strong operational ownership mindset.
- Calm and structured approach during incident management.
- Passion for automation and operational excellence.
- Customer-first attitude with strong SLA awareness.
- Continuous learning mindset.
- Strong collaboration and team-player attitude.
- Ability to identify improvement opportunities proactively.
What Success Looks Like
- Consistently meets SLA and operational targets.
- Successfully resolves L2 incidents with minimal escalation.
- Improves operational efficiency through automation.
- Maintains high infrastructure availability and reliability.
- Delivers meaningful cost optimization recommendations.
- Produces high-quality documentation and operational reports.
- Contributes to team capability development and knowledge sharing.
- Demonstrates readiness to progress toward CloudOps Engineer – L3.
Why Join Comprinno
- Work with one of the leading AWS Managed Service and Consulting Partners in APJ.
- Gain hands-on exposure to large-scale enterprise AWS environments.
- Work on CloudOps, Security, Reliability, FinOps, and Automation initiatives.
- Learn from AWS-certified architects and cloud experts.
- Accelerate your growth through certifications, mentorship, and real-world customer engagements.
- Contribute to Tevico, our cloud governance and observability platform.
- Join a culture that values ownership, innovation, operational excellence, and continuous learning.
Pay: Up to ₹1,000,000.00 per year
Work Location: In person