Number of Positions:
: 2
Primary Skills:
: POSTGRESQL,JAVA SPRING BOOT,INCIDENT MANAGEMENT,REST APIS
Job Description:
Role Summary
Own end-to-end production stability — from hotfix delivery to root-cause closure. Serve as the final escalation point for critical incidents, with deep system knowledge to diagnose fast and ship safe fixes under pressure.
Key Responsibilities
- Production Hotfixes: Own the full diagnosis-to-deployment lifecycle; coordinate emergency releases with zero ambiguity.
- Deep-Dive Troubleshooting: Investigate complex, intermittent, and data-related issues across services, databases, and infra.
- SLA Ownership: Ensure P1/P2 tickets are resolved within agreed SLAs; escalate proactively when needed.
- Code & Patch Delivery: Write, review, and deploy targeted hotfix code to production-grade quality standards.
- RCA & Post-Mortems: Document thorough root-cause analyses with structured preventive action items.
- Change Advisory: Evaluate risk of hotfixes; liaise with release and CAB processes.
- Runbooks & SOPs: Maintain and improve operational runbooks for known failure patterns.
-
Java / Spring Boot or equivalent backend stack
-
SQL — complex query debugging, execution plan analysis
-
Log analysis — <>
-
REST API troubleshooting & integration debugging
-
Git-based hotfix & branching workflows
-
Linux / bash — process, memory, I/O diagnostics
-
Incident & change management (ITIL awareness)
-
Cloud infra basics — Azure / AWS monitoring tools
-
Microservices & event-driven architecture (ASB / Kafka)
-
Container platforms — Docker / Kubernetes
-
CI/CD pipelines — GitHub Actions / Jenkins
-
Performance profiling & thread-dump analysis
-
APM tools — Dynatrace, New Relic, App Insights
-
Payment domain / PCI compliance awareness
-
Scripting — Python or shell for automation