Position: Site Reliability Engineer (SRE)
Experience Required: 4-6 years
Location: Hyderabad
Work Schedule: 24x7 support . Will follow a rotational roster schedule
Key Responsibilities:
* Provide 24x7 operational support to ensure the availabilitxay and reliability of critical systems and applications.
* Monitor, troubleshoot, and resolve issues across infrastructure and applications using EFK/ELK stacks.
* Develop, deploy, and maintain Java Spring Boot-based applications for enhanced system reliability.
* Perform database querying and optimization to support application performance.
* Collaborate with cross-functional teams to implement automation solutions for incident management and system reliability improvements.
* Create and maintain technical documentation for processes and troubleshooting guides.
Required Skills:
Linux Administration – System performance tuning, resource monitoring
Docker – Containerization, image management
Scripting – Shell scripting or Python for automation
Database (DB) – Basic SQL commands and database troubleshooting
Ansible – Automation & configuration management
Networking & Servers – Nginx setup, network flow understanding
Monitoring & Logging – Fluentd, Elasticsearch, Kibana for log analysis
CI/CD Pipeline – Jenkins for deployment automation
Qualifications:
* Bachelor’s degree in Computer Science, Engineering, or a related field.
* 4+ years of experience in a similar role, with a strong focus on system reliability and operational support.
Other Details:
* Comfortable working in a 24x7 support environment with rotational shifts.
* Strong problem-solving skills and a proactive approach to incident resolution.