Key Responsibilities / Deliverables
Perform advanced troubleshooting and incident resolution for production platform issues.
Investigate incidents using logs, monitoring dashboards, and database queries.
Perform root cause analysis (RCA) for recurring or high-severity incidents.
Monitor and analyze system performance metrics and application logs.
Diagnose issues across application, middleware, data, and integration layers.
Support database troubleshooting and data validation activities.
Coordinate with application development teams and infrastructure teams for issue
resolution.
Maintain and enhance knowledge base articles, troubleshooting guides, and SOP
documentation.
Provide technical guidance and mentorship to L1 support engineers.
Participate in incident response and escalation management processes.
Develop a strong architectural understanding of the platform components and data flows.
Identify platform improvement opportunities, performance bottlenecks, and operational
risks.
Ensure platform availability and operational continuity for financial communication systems.
Available 24x7 to provide support to Platform.
Provide on-call and weekend support as needed.
Work Experience & Skillset
4+ years of experience in TechOps roles.
Strong hands-on experience with Linux / Unix system troubleshooting.
Strong knowledge of SQL queries and database troubleshooting. (MySQL, PostgreSQL,
MongoDB).
Experience working with log monitoring and analysis platforms such as ELK/CloudWatch.
Hands-on experience with monitoring tools like Grafana and observability platforms.
Knowledge of REST APIs, API troubleshooting, and HTTP diagnostics.
Experience on Amazon Managed Workflows for Amazon managed Airflow, Kafka, etc.
Familiarity with distributed systems, event streaming platforms (Kafka), or data
platforms.
Advance exposure to cloud environments such as AWS.
Hands on experience Kubernetes/EKS and cluster management services.
Good scripting knowledge (Shell/Python)
Exposure to financial services, banking platforms, or high-volume transactional systems
is highly desirable.
Knowledge on Jenkins for running and monitoring pipelines at the time of Deployments