Bengaluru, Karnataka
Job Summary
Role Summary
The SRE Coach is responsible for driving the adoption, maturity, and governance of Site Reliability Engineering (SRE) practices across enterprise platforms. This role acts as a transformation leader, mentor, and advisor to engineering, operations, and application teams, ensuring reliability, scalability, and operational excellence across heterogeneous technology landscapes.
Key Responsibilities
- Define and drive the enterprise SRE strategy, standards, and operating model
- Coach and mentor SRE Practitioners, Operations SMEs, and Application Architects
- Establish and govern SLOs, SLIs, Error Budgets, and reliability KPIs
- Lead production readiness reviews, reliability assessments, and SRE maturity models
- Guide teams on observability, monitoring, logging, and alerting
- Drive performance engineering, capacity planning, and resilience design
- Define and institutionalize Chaos Engineering practices
- Partner with Architecture, Platform, Cloud, Security, and Delivery teams
- Influence leadership on reliability vs delivery velocity trade-offs
- Develop and maintain SRE playbooks, runbooks, SOPs, and accelerators
- Enable adoption of Infrastructure as Code, CI/CD, and automation-first operations
- Support hybrid, cloud, SaaS, COTS, enterprise, and legacy platforms
Must Have
- 10+ years of experience in SRE, Production Engineering, or Platform Engineering
- Strong hands-on experience across applications, infrastructure, and cloud
- Deep understanding of:
o MTTR, MTBF, MTTD
o SLO / SLI / Error Budgets
o Availability, resiliency, self-healing systems
- Strong experience with observability and monitoring tools (e.g., Dynatrace, Splunk)
- Expertise in incident management, RCA, and postmortems
- Experience with container platforms and orchestration
- Strong understanding of Infrastructure as Code and automation
- Excellent communication, mentoring, and stakeholder management skills
Good to Have
- Internal or external SRE certification
- Experience across diverse enterprise platforms including:
o .NET, Java EE, MS Azure, Angular, Android/iOS
o Oracle, MS SQL, Power BI, MS BI
o SAP, Salesforce, Siebel, MS Dynamics
o SaaS and COTS platforms
o Mainframe systems
o Industrial platforms such as Siemens, ABB, AutoCAD
Key Responsibilities
1. To oversee quality assurance processes, ensuring adherence to coding standards , implementation of best practices and perform Value creation and KM activities.
2. To ensure process improvement and compliance| and participate in technical design discussion and to review technical documents.
3. Responsible for shaping the overall project strategy working closely with stakeholders to define project scope, objectives, deliverables and keeping track of schedule to ensure on time delivery as per the defined quality standards.
4. To work closely with the development team, On-site Engineers to understand technical requirements and work with them to address and resolve technical issues.
5. Identify & flag potential risks and issues that may impact project timelines or quality, develop mitigation strategies / contingency plans to address risks and provide regular project updates to key stakeholders.
#body.unify div.unify-button-container .unify-apply-now: focus, #body.unify div.unify-button-container .unify-apply-#body.unify div.unify-button-container .unify-apply-now: focus, #body.unify div.unify-button-container .unify-apply-