Senior Database Operations Engineer (Senior DB Ops)
Location: Offshore (India Development Centers)
Experience: 7–13 Years
Role Overview
We are seeking a highly skilled Senior Database Operations Engineer to drive database reliability, performance, automation, and operational excellence across enterprise-scale cloud and hybrid environments.
In this role, you will own the operational health of critical database platforms, including SQL Server, Azure SQL, PostgreSQL, and Azure Cosmos DB, ensuring high availability, scalability, security, and resilience. You will partner closely with Infrastructure, Application Development, Security, and DevOps teams to support mission-critical systems while continuously improving operational efficiency through automation and modern SRE practices.
The ideal candidate combines deep database administration expertise with strong cloud, automation, monitoring, and troubleshooting capabilities, along with a passion for reducing operational toil and improving platform reliability.
Key Responsibilities
Database Reliability & Operations
- Own the availability, reliability, performance, and scalability of database environments across Production, UAT, QA, and Development.
- Administer and support SQL Server, Azure SQL Database, Azure Cosmos DB, PostgreSQL, and other Azure-native database services.
- Design, implement, and maintain robust backup, restore, and recovery strategies aligned with business RPO/RTO requirements.
- Perform proactive database maintenance including index optimization, statistics management, integrity checks, and capacity planning.
- Manage database patching, upgrades, migrations, and lifecycle management activities.
- Ensure database security through RBAC, access controls, auditing, and compliance best practices.
High Availability & Disaster Recovery
- Design, implement, and support High Availability (HA) and Disaster Recovery (DR) solutions.
- Manage technologies such as Always On Availability Groups, replication, failover clustering, and cloud-native resiliency mechanisms.
- Conduct periodic DR testing and validate recovery readiness across environments.
Performance Engineering
- Diagnose and resolve performance bottlenecks, blocking, deadlocks, long-running queries, and replication latency issues.
- Partner with application teams to optimize database design, query performance, indexing strategies, and workload efficiency.
- Analyze database growth trends and recommend scaling strategies to support business expansion.
Automation & DevOps
- Drive an automation-first culture by eliminating repetitive operational tasks through scripting and infrastructure automation.
- Develop and maintain automation using PowerShell, Python, Azure Automation Runbooks, Terraform, and Ansible.
- Support CI/CD pipelines, database deployments, release automation, and Infrastructure-as-Code initiatives using Azure DevOps.
- Implement self-healing and operational efficiency improvements to reduce manual intervention.
Monitoring, Observability & Incident Management
- Implement proactive monitoring, ing, and observability solutions using tools such as SentryOne (SQL Sentry), Dynatrace, and Azure-native monitoring services.
- Establish operational dashboards, health reporting, and performance baselines.
- Participate in incident response, root cause analysis (RCA), and post-incident reviews.
- Serve as a key contributor during major production incidents and on-call support rotations.
Collaboration & Leadership
- Collaborate with Infrastructure, Security, Application Development, and Cloud Engineering teams to ensure platform stability and operational excellence.
- Mentor junior engineers and promote database operational best practices.
- Contribute to operational standards, runbooks, documentation, and continuous improvement initiatives.
Required Technical Skills
Database Platforms
- Microsoft SQL Server
- Azure SQL Database
- Azure Cosmos DB
- PostgreSQL
Database Administration
- Installation, configuration, and patch management
- Backup, recovery, and disaster recovery planning
- High Availability architectures and support
- Performance tuning and query optimization
- Indexing and statistics management
- Replication setup and troubleshooting
- Capacity planning and database growth forecasting
- Security administration, RBAC, auditing, and compliance
Automation & DevOps
- PowerShell scripting
- Python automation
- Azure DevOps pipelines
- Azure Automation Runbooks
- Terraform and Infrastructure-as-Code (IaC)
- Ansible configuration management
Monitoring & Observability
- SQL Sentry / SentryOne
- Dynatrace
- Database performance monitoring
- ing, reporting, and observability practices
- Blocking, deadlock, and replication latency analysis
Infrastructure & Cloud
- Azure Virtual Machines, Networking, Storage, Firewall, and IIS
- Windows Server Administration
- Basic Linux Administration
- Cloud-native database services and architectures
Troubleshooting
- Cross-platform troubleshooting across databases, infrastructure, and applications
- Root cause analysis and incident resolution
- Basic understanding of .NET applications and database interactions
Required Competencies
- Strong analytical and problem-solving skills
- Operational ownership mindset with a focus on reliability and resilience
- Ability to manage production incidents under pressure
- Excellent communication and stakeholder management skills
- Strong collaboration across distributed global teams
- Experience working in Agile, DevOps, or Site Reliability Engineering (SRE) environments
- Ability to support teams operating in the EST time zone
Preferred Qualifications
- Experience supporting enterprise financial services, banking, or payment platforms
- Knowledge of Site Reliability Engineering (SRE) concepts including SLIs, SLOs, Error Budgets, and Toil Reduction
- Experience with ServiceNow, PagerDuty, and operational workflow platforms
- Exposure to Kubernetes and containerized environments
- Experience building operational dashboards using Power BI
- Familiarity with CI/CD governance, release management, and change control processes
- Understanding of cloud-native observability platforms and AI-assisted operational tooling
Certifications (Preferred)
- Azure certifications: AZ-104, AZ-305, DP-300
- Terraform Certification: HashiCorp Terraform Associate
sql server,data definition language,data modeling,query optimization,database migration,data integration,data manipulation language,data validation