Lucknow, Uttar Pradesh
Job Summary
Major Incident Manager
The Major Incident Manager is responsible for overseeing and actively managing the Incident and Problem Management processes within Technical Operations.
This role primarily focuses on rapid service restoration during production incidents, while ensuring that underlying issues are addressed through effective Problem Management practices. The position also involves managing the Problem and ticket queue to drive long-term operational stability and service improvement.
The role ensures adherence to standardized Incident Management practices, including incident classification, escalation protocols, and communication frameworks. It involves conducting Root Cause Analysis (RCA) for Major and high-priority incidents, with a focus on both technical resolution and process improvement.
The candidate must be able to perform effectively under pressure in business-Major environments, with a strong end-user and business focus. Excellent communication, coordination, and stakeholder management skills are essential, along with the ability to drive resolution across cross-functional teams without direct authority.
A sound understanding of high-availability environments is required to effectively lead incident triage, drive resolution, and support ongoing problem management initiatives.
Key Responsibilities
Own and lead Major/high-priority incident bridge calls, directing all responding resources including internal teams, vendors, and stakeholders.
Guide incident responders through structured and analytical troubleshooting approaches.
Activate and coordinate appropriate technical resources to ensure timely service restoration.
Provide regular, clear, and timely communication updates to stakeholders during Major incidents.
Manage the Problem Management queue, ensuring root causes are identified and addressed.
Conduct and drive Root Cause Analysis (RCA), assign action items, and ensure timely closure.
Track and validate the effectiveness of mitigation and preventive actions post RCA.
Drive continuous improvement across Incident and Problem Management processes.
Improve system availability and service stability through proactive problem identification and resolution.
Provide on call/off-hours support as part of the incident management rotation.
Monitor and report on KPIs related to Incident and Problem Management effectiveness.
Review planned RFCs (Changes) to assess potential risks and identify impact areas.
Skill Requirements
ITIL Certification (Good to have) Cross-skilled knowledge of ITSM processes (Incident, Problem, Knowledge, Change) is expected. Strong analytical, organizational, and problem-solving skills Proven expertise in Root Cause Analysis methodologies (5 Why, Why Tree, etc.) Basic technical understanding of systems such as UNIX, Windows, Networks, Messaging, Databases, and Backup solutions (preferred) Exposure to enterprise applications (SAP, Agile, BI tools, etc.) (preferred) Excellent communication skills, with the ability to explain technical issues to non-technical stakeholders Ability to coordinate cross-functional teams to resolve Major issues without direct authority Strong capability to analyse and troubleshoot application and infrastruc
Other Requirements
Ability to coordinate cross-functional teams to resolve Major issues without direct authority
Strong capability to analyse and troubleshoot application and infrastructure issues
Collaborative mindset with ability to work within teams and share knowledge effectively
Experience working in a customer-centric, enterprise IT environment
Prior experience supporting enterprise-scale infrastructure or applications
#body.unify div.unify-button-container .unify-apply-now: focus, #body.unify div.unify-button-container .unify-apply-#body.unify div.unify-button-container .unify-apply-now: focus, #body.unify div.unify-button-container .unify-apply-