Role & Responsibilities
Responsibilities
- Monitor production infrastructure, platform health, and application uptime across 24x7 environments, ensuring SLA adherence and rapid incident response.
- Detect, triage, and escalate incidents in real time, coordinating with engineering and DevOps teams to drive resolution within defined RTO and RCA timelines.
- Manage and respond to alerts from monitoring tools (Grafana, PagerDuty, Datadog, or equivalent), distinguishing signal from noise and reducing MTTR.
- Execute routine operational tasks including deployments, configuration changes, log analysis, and scheduled maintenance activities.Maintain and improve runbooks, escalation playbooks, and incident documentation to build institutional knowledge and reduce repeat issues.
- Collaborate with the engineering team on observability improvements, including alert tuning, dashboard creation, and proactive capacity monitoring.
- Participate in post-incident reviews and contribute to root cause analysis and preventive action planning.
Ideal Candidate
- Strong NOC / Infrastructure Operations Engineer Profile with 24x7 monitoring and incident-response experience
- Mandatory (Experience) – Must have 2+ years of experience in a NOC (Networks Operation Center) Engineer/Infrastructure operations/DevOps support/Product support role
- Mandatory (Tech skill 1) – Must have a hands-on understanding of networking, Linux systems and cloud infrastructure
- Mandatory (Tech skill 2) – Must have proficiency with monitoring and observability tools such as Grafana, Prometheus, Datadog, Graylog or similar
- Mandatory (Tech skill 3) – Must have hands-on experience in real-time incident detection, triage and escalation, coordinating resolution within defined RTO/RCA timelines and reducing MTTR
- Mandatory (Tech skill 4) – Must have working knowledge of containerization and orchestration technologies (Docker and Kubernetes preferred)
- Mandatory (Tech skill 5) – Must be comfortable scripting in Bash, Python or similar for automation and operational tasks
- Mandatory (Communication) – Must have strong communication skills, able to write clear incident updates and escalation notes under pressure
- Mandatory (Note 1) – Must be comfortable with shift-based scheduling, including nights and weekends
- Preferred (Education): Bachelors degree in Computer science
- Preferred (DevOps) – Familiarity with CI/CD pipelines and DevOps workflows.
- Preferred (Process) – Exposure to runbooks, escalation playbooks, post-incident reviews and root-cause analysis.
Pay: ₹217,595.91 - ₹1,200,000.00 per year
Application Question(s):
- Are you comfortable with 24x7 rotational shifts, including nights and weekends?
- If not in Gurgaon, are you open to relocation to Gurgaon for this opportunity?
- What is your notice period in days? (30)
- What is your current CTC in LPA?
- What is your expected CTC in LPA? (12)
- Are you comfortable working for Gurgaon, Sector 58 location?
Experience:
- Overall: 2 years (Required)
- NOC / Infrastructure Operations: 2 years (Required)
Work Location: In person