Job Description: Technical Operations Manager
Role Overview
We are seeking an experienced Cloud Infrastructure Shift Supervisor to lead our 24x7 operations team. You will ensure maximum uptime, rapid incident resolution, and seamless shift handovers. This role balances real-time technical incident oversight with people management.
Responsibilities -
- Shift Leadership: Manage a rotating team of cloud engineers to ensure continuous 24x7 coverage.
- Incident Management: Act as Incident Resolution Owner for Severity-1 and Severity-2 infrastructure outages.
- Operations Oversight: Monitor system health dashboards (Datadog, New Relic) and prioritize the ticketing queue.
- SLA Compliance: Ensure the team meets or exceeds Service Level Agreements (SLAs) for response and resolution times.
- Shift Handovers: Execute rigorous, documented handovers to incoming shifts to prevent data gaps.
- Team Development: Mentor junior engineers, conduct performance reviews, and manage shift scheduling to prevent burnout.
Required Technical Skills -
- Cloud Platforms: Proven experience managing infrastructure on AWS, Azure, or GCP.
- Monitoring & Logging: Proficiency with tools like Splunk, Datadog, Prometheus, or ELK stack.
- Ticketing Systems: Advanced knowledge of ITSM platforms such as Jira Service Desk or ServiceNow.
- OS & Scripting: Familiarity with Linux/Windows administration and basic scripting (Bash, Python).
- Strong understanding of deployment, maintenance and monitoring of Data Pipelines
Required Soft Skills -
- Crisis Management: Ability to remain calm and direct technical teams during high pressure outages.
- Communication: Clear written and verbal communication for page updates and stakeholder alerts.
- Problem Solving: Strong analytical skills to spot patterns in recurring infrastructure alerts.
- Flexibility: Absolute willingness to work a rotating schedule, including nights, weekends, and holidays.
Experience & Qualifications -
- Bachelor's degree in Computer Science, IT, or equivalent practical experience.
- 7+ years of experience in a Cloud Support, NOC, or Site Reliability Engineering (SRE) environment.
- 3+ years of experience in a team lead, supervisor, or senior engineer capacity.
- Relevant certifications (e.g., AWS SysOps Administrator, Azure Administrator, ITIL Foundation) are a plus.
Note:
By submitting your application, you consent to being contacted by our Talent Acquisition team via phone call, email, SMS, WhatsApp, or other communication channels regarding your application and relevant career opportunities.