Greetings from TVS Electronics!
Role Summary
We are looking for a highly experienced Application Monitoring / Observability Engineer to manage and optimize enterprise monitoring environments across applications, infrastructure, databases, and URL uptime monitoring. The role requires hands-on expertise in incident scrubbing, monitoring optimization, APM, alert tuning, and providing implementation recommendations to improve the existing monitoring landscape.
The candidate should have strong experience in monitoring large-scale production environments, reducing alert noise, improving incident quality, and driving operational excellence.
Experience Required - 4Y - 8Y.
Environment Details
- 24×7 Production Environment
- 400+ Servers
- 150+ URL/Uptime Monitoring
- 80+ Applications
- Monitoring Tool – Site24x7
Key Responsibilities
- Monitor and manage enterprise application and infrastructure environments.
- Perform incident scrubbing and reduce alert noise by identifying false positives, duplicate alerts, and recurring issues.
- Review and optimize existing monitoring setup to improve alert quality and reduce incident volume.
- Configure, fine-tune, and maintain monitoring thresholds, escalation rules, and alerting mechanisms.
- Work on APM (Application Performance Monitoring) for identifying performance bottlenecks, slow transactions, API failures, and dependency issues.
- Monitor URL uptime, availability, SSL, response times, and service health.
- Analyze recurring incidents and work towards permanent fixes through RCA and problem management.
- Collaborate with application, database, infrastructure, cloud teams for issue resolution.
- Provide implementation recommendations and monitoring improvement plans.
- Support onboarding and implementation of new monitoring tools such as PRTG.
- Create dashboards, reports, and monitoring health metrics for stakeholders.
- Participate in 24×7 production support environment and incident management process.
Mandatory Skills
- Strong hands-on experience in Application Monitoring / Infrastructure Monitoring.
- Experience in monitoring tools such as Site24x7, Dynatrace, Datadog, PRTG, Prometheus/Grafana, etc.
- Strong knowledge of APM (Application Performance Monitoring).
- Experience in incident management, problem management, and RCA.
- Hands-on experience in alert tuning, threshold optimization, and incident reduction.
- Experience with monitoring large enterprise environments (servers, applications, databases).
- Knowledge of URL monitoring and synthetic monitoring.
- Exposure to cloud monitoring (AWS / OCI / Azure).
- Strong troubleshooting and analytical skills.
Preferred Skills
- Experience in handling or coordinating technical monitoring/support teams.
- Strong stakeholder management and cross-functional coordination skills.
- Experience leading incident bridges/war rooms for production issues.
- Ability to drive monitoring improvements and operational excellence across teams.
Pay: ₹800,000.00 - ₹1,000,000.00 per year
Work Location: In person