Trend Analysis & Problem Identification
Identify recurring incident patterns, anomalies, and signs of alert fatigue that may indicate deeper systemic issues.
Collaborate with L2/L3 teams to review telemetry data and recommend improvements to alert thresholds, rules, and policies.
Provide insights that support proactive issue prevention, noise reduction, and overall monitoring refinement.
Platform Management & Optimization
Develop, update, and maintain dashboards that reflect real‑time system health, performance metrics, and service behavior.
Support the ongoing adoption and optimization of Dynatrace, enhancing dashboarding and visualization capabilities for cloud and on‑prem observability.
Assist in routine platform checks, ensuring monitoring tools remain accurate, stable, and aligned with business and operational requirements.
Leadership & Collaboration
Responsible for organizing the work for the team, including planning, task breakdown, and ensuring clarity of priorities.
Provide structured, timely updates to leadership on progress, risks, blockers, team capacity, and delivery timelines.
Work closely with application teams, SRE groups, and infrastructure operations during incident triage, investigations, and routine monitoring reviews.
Ensure clear, timely, and effective communication with stakeholders during service-impacting events, providing status updates and context as needed.
Ensures adherence to engineering best practices, drives operational excellence, and maintains accountability for team delivery outcomes
Support platform stability and availability through adherence to lifecycle maintenance, patching schedules, and vulnerability management processes.
Contribute to the improvement of monitoring workflows, alert routing logic, runbook effectiveness, and incident management practices.
Innovation & AI Enablement
Assist in exploring and adopting AI-driven capabilities that improve observability, automate root‑cause identification, and reduce manual effort.
Contribute to internal knowledge sharing by documenting best practices, playbooks, AI reference materials, and usage guidelines (e.g., Copilot tips).
Collaboration & Leadership Support
Partner with cross-functional teams to align monitoring practices with evolving business needs and operational priorities.
Drive end-to-end delivery of monitoring initiatives—requirements gathering, planning, execution oversight, and delivery validation.
Coordinate cross‑team dependencies, ensure timelines are met, and proactively remove blockers for the team.
Provide subject‑matter support for ITSM processes including incident, problem, and change management discussions.