Experience
10+ Years
Location
Mumbai
Job Summary:
We are seeking an experienced Disaster Recovery (DR) Manager (L3) to lead end-to-end DR strategy, execution, governance, and continuous improvement across enterprise applications and infrastructure. The role ensures resilience, compliance, and rapid recovery through effective planning, orchestration of DR drills, and alignment with regulatory standards. The candidate will work closely with cross-functional teams to ensure high availability and business continuity.
Key Responsibilities: Define and drive Disaster Recovery strategy, governance, and execution across applications and infrastructure. Plan and execute DR drills (failover/failback) and manage large-scale DR testing programs. Ensure alignment with defined RPO/RTO objectives and optimize recovery timelines. Prepare DR testing schedules and oversee execution and reporting. Perform root cause analysis (RCA) for DR incidents and implement corrective actions. Collaborate with application, infrastructure, and network teams for DR readiness. Manage escalations and coordinate with OEM/vendors for issue resolution. Maintain DR dashboards, metrics, and reporting for leadership and compliance. Define and maintain DR runbooks, SOPs, policies, and recovery procedures. Ensure resilience across compute, storage, network, virtualization, database, and backup layers. Align DR processes with regulatory frameworks (RBI, ISO standards). Lead DR restoration during crisis events and ensure 24×7 readiness. Manage onboarding of new applications into DR frameworks. Track DR readiness gaps and implement continuous improvement initiatives. Automate DR processes and optimize operations through tools and workflows. Coordinate with cybersecurity, risk, and business teams for DR governance. Conduct DR calendar planning, runbook validation, and readiness assessments.
Qualifications and Skills:
Technical Skills (Must Have): Strong understanding of Disaster Recovery processes, RPO, and RTO strategies Experience in DR drill execution, failover/failback coordination, and incident tracking Hands-on experience with DR and backup technologies Strong knowledge of DR lifecycle, replication, and data center infrastructure Experience in managing DR operations for large enterprises or financial institutions Expertise in DR tools like KRO and Perpetuity Knowledge of cloud platforms – Amazon Web Services (AWS), Microsoft Azure (Azure), and hybrid environments Strong understanding of infrastructure components (compute, storage, network, virtualization, database) Experience in automation and DR orchestration Strong analytical and problem-solving skills for high-impact situations
Secondary Skills (Nice to Have): Knowledge of monitoring tools like MOM, SCOM Understanding of core banking systems, digital channels, and payment infrastructure Experience working in regulatory-driven environments Exposure to cybersecurity and risk management frameworks
Certifications: VMware / Microsoft Windows / Hyper-V certifications – Preferred ISO 22301 / ISO 27001 knowledge – Preferred
Behavioral Skills: Strong leadership and stakeholder management skills Ability to handle crisis situations and high-pressure environments Excellent communication and documentation skills Strong ownership and accountability Proactive and structured problem-solving mindset
Personal Attributes: Accepts responsibility Positive attitude and integrity Organized and proactive approach Strong judgement and decision-making skills Openness to learning and collaboration