- Key Responsibilities
- Provide L2 L3 production support for big data platforms and applications ensuring high availability and performance
- Monitor troubleshoot and resolve issues related to big data jobs data pipelines and batch workloads using ESP or similar schedulers
- Manage incidents problems and service requests through ServiceNow adhering to defined SLAs and escalation procedures
- Apply ITIL best practices for incident problem change and release management within the big data environment
- Perform root cause analysis for recurring issues and implement permanent corrective actions to improve platform stability
- Collaborate with data engineering infrastructure and application teams to deploy fixes enhancements and configuration changes
- Participate in change advisory processes assess risks and support planned maintenance and releases on big data systems
- Maintain and update support documentation runbooks and knowledge base articles for common issues and procedures
- Proactively identify performance bottlenecks capacity risks and operational gaps and recommend improvements
- Provide on call support as required including support during off hours for critical incidents and scheduled activities
- Minimum Qualifications
- Bachelor s degree in Engineering Computer Science or related field B
- Tech or equivalent
- 5 8 years of hands on experience in production support or operations for big data platforms or large scale data systems
- Strong working knowledge of big data ecosystems and related operational concepts job monitoring data pipelines batch processing
- Proven experience working within ITIL frameworks including incident problem and change management processes
- Practical experience using ServiceNow or similar ITSM tools for ticketing workflow and SLA management
- Experience supporting and monitoring workloads scheduled through ESP or equivalent enterprise schedulers
- Solid troubleshooting skills with the ability to analyze logs identify patterns and resolve production issues under time pressure
- Good to have skills
- Monitoring tools e
- g
- Splunk Dynatrace AppDynamics Shell scripting SQL Linux Unix administration Cloud platforms AWS Azure GCP
- Knowledge of more than one technology
- Basics of Architecture and Design fundamentals
- Knowledge of Testing tools
- Knowledge of agile methodologies
- Understanding of Project life cycle activities on development and maintenance projects
- Understanding of one or more Estimation methodologies Knowledge of Quality processes
- Basics of business domain to understand the business requirements
- Analytical abilities Strong Technical Skills Good communication skills
- Good understanding of the technology and domain
- Ability to demonstrate a sound understanding of software quality assurance principles SOLID design principles and modelling methods
- Awareness of latest technologies and trends
- Excellent problem solving analytical and debugging skills
Technology->Big Data->Big Data - ALL,Foundational->Service Management->Service Design->ITIL