Job Summary
Serve as an Infra Dev Specialist with primary focus on LogicMonitor and Splunk to design implement and optimize enterprise monitoring and observability solutions for a global organization in a hybrid work model. Collaborate with cross functional teams to enhance platform reliability automate workflows and deliver actionable insights that improve service uptime and operational excellence in rotational shifts.
Responsibilities
Design and implement scalable monitoring solutions using LogicMonitor to ensure high availability and performance of critical infrastructure across data center and cloud environments
Configure maintain and optimize Splunk based observability including data onboarding dashboards and alerts to provide timely and actionable insights for incident response and problem management
Develop automation scripts and reusable components for monitoring configuration deployment and maintenance to reduce manual effort and improve consistency across environments
Collaborate with application infrastructure and security teams to define monitoring requirements and translate them into LogicMonitor and Splunk configurations that align with enterprise standards
Troubleshoot complex issues in monitoring pipelines including data collection alert noise and dashboard accuracy while driving sustainable fixes and improvements
Implement and refine alerting strategies to minimize false positives and ensure rapid detection of real incidents thereby contributing to improved mean time to detect and mean time to resolve
Document monitoring architectures runbooks and standard operating procedures in clear and comprehensive form to support knowledge sharing and operational continuity across rotational shifts
Coordinate with service owners during platform changes and releases to validate monitoring coverage and ensure that new services are onboarded into LogicMonitor and Splunk in a timely manner
Analyze trends in infrastructure performance and event data to identify capacity risks reliability gaps and opportunities for optimization that support business continuity and cost efficiency
Support compliance and audit activities by ensuring that monitoring configurations logs and dashboards adhere to internal policies and external regulatory expectations
Work in hybrid and rotational shift model to provide continuous coverage for monitoring operations and incident support while collaborating effectively across time zones and regions
Contribute to continuous improvement initiatives by evaluating new features of LogicMonitor and Splunk proposing enhancements and driving small proof of concept efforts that deliver measurable value
Engage with vendor support and internal stakeholders to resolve advanced product issues and to stay aligned with recommended practices for LogicMonitor and Splunk deployment and usage
Qualifications
Possess seven to eight years of hands on experience in infrastructure monitoring with strong and demonstrable expertise in implementing and administering LogicMonitor in enterprise environments
Demonstrate advanced proficiency in Splunk including data ingestion configuration of forwarders creation of searches dashboards alerts and use of Splunk apps relevant to infrastructure observability
Apply solid understanding of network server database and cloud infrastructure concepts to design meaningful metrics and alerts that reflect real service health and user experience
Utilize scripting experience in areas such as PowerShell or Python to automate monitoring tasks integrate external systems and streamline routine operational activities
Exhibit strong analytical and problem solving skills with ability to interpret large volumes of monitoring data and logs to derive clear root causes and propose effective remediation options
Communicate clearly in verbal and written form to collaborate with distributed teams document procedures and influence stakeholders on monitoring best practices without formal authority
Adapt effectively to hybrid work model and rotational shifts while maintaining high standards of reliability accountability and focus on service level objectives
Certifications Required
Preferred certifications include LogicMonitor Certified Professional and Splunk Certified Power User or Splunk Certified Administrator.