- We are on the lookout for a skilled Principal software engineer Lead Role with a strong background in DevOps and platform engineering to join our Application Observability team
- This team plays a critical role in managing stateful services within the Service Reliability and Observability SRO department
- The SRO department provides innovative observability solutions and standardised methods to enhance the efficiency and reliability of IT systems simplifying tasks for both infrastructure and software engineers
- The Application Observability AppO team focuses on driving observability forward by utilizing tools such as Open Telemetry Elastic Stack Prometheus and Grafana
- What you will be doing
- As part of the Application Observability AppO team your responsibilities will include
- Defining and refining monitoring and alerting rules both for the team and organisation wide
- Work together with other teams Platform and Observability Backend to enhance performance and fulfil user stories
- Leading projects such as Grafana s migration from on premises data centers to AWS by planning defining requirements supervising and implementing
- Improving the deployment of services using Git workflows and ArgoCD
- Proposing and validating performance and user experience improvements for AppO services
- Addressing issues implementing preventive measures and managing postmortems and related improvement tasks
- Analysing performance identifying anomalies and defining documenting and implementing corrective measures
- Ensuring compliance with the SLA
- Additionally you will participate in the on call rotation for team services which requires the ability to resolve issues using runbooks knowledge on skill like Elasticsearch ThanosKafka OpenTelemetry Grafana and Docker
- Three KEY domain exposure
- DevOps
- Platform Engineering
- Application Observability
- Technology DevOps Site Reliability Engineering SRE
- Good knowledge on software configuration management systems
- Strong business acumen strategy and cross industry thought leadership
- Awareness of latest technologies and Industry trends
- Logical thinking and problem solving skills along with an ability to collaborate
- Two or three industry domain knowledge
- Understanding of the financial processes for various types of projects and the various pricing models available
- Client Interfacing skills
- Knowledge of SDLC and agile methodologies
- Project and Team management
Technology->DevOps->Site Reliability Engineering(SRE)