Project Role : DevOps Engineer
Project Role Description : Responsible for building and setting up new development tools and infrastructure utilizing knowledge in continuous integration, delivery, and deployment (CI/CD), Cloud technologies, Container Orchestration and Security. Build and test end-to-end CI/CD pipelines, ensuring that systems are safe against security threats.
Must have skills : Site Reliability Engineering
Good to have skills : NA
Minimum
3 year(s) of experience is required
Educational Qualification : 15 years full time education
Summary:
AI Powered Tech Talent
As a Site Reliability Engineer (SRE), you will be responsible for ensuring that systems are stable, scalable, and highly available, effectively bridging the gap between Business Application development and IT operations. Your typical day will involve leveraging software engineering principles to automate operations, enhance observability, manage incidents, and optimize system performance. You will focus on designing and maintaining production systems aligned with defined Service Level Objectives (SLOs) and error budgets. The role emphasizes preventing downtime, improving system resilience, and accelerating delivery velocity through automation, fault tolerance, and performance engineering.
Roles & Responsibilities:
- Expected to perform independently and become an SME.
- Required active participation/contribution in team discussions.
- Contribute in providing solutions to work related problems.
- Monitor and optimize system uptime, latency, and throughput to meet SLOs and SLIs.
- Lead incident response, manage escalations, conduct root cause analysis (RCA), and drive postmortem reviews.
- Develop and maintain CI/CD pipelines while eliminating manual toil through automation and scripting.
- Implement monitoring, logging, and tracing frameworks to ensure real-time observability of distributed systems.
- Conduct capacity planning, resource forecasting, and infrastructure scaling to handle surge conditions.
- Partner with development teams to enable safe feature releases with automated testing and rollback mechanisms.
- Implement disaster recovery strategies, multi-region resilience, and chaos testing for business continuity.
- Drive continuous process improvement using post-incident analytics and data-driven insights.
- Collaborate with product, design, ML, and DevOps teams to build intelligent workflows and enhanced user experiences.
- Implement Infrastructure as Code (IaC) using tools such as Terraform, CloudFormation, Azure DevOps, or Pulumi.
- Ensure cloud infrastructure security, compliance, and performance optimization for highly available systems.
Professional & Technical Skills:
- Must To Have Skills: Proficiency in Site Reliability Engineering principles and practices.
- Good To Have Skills: Experience with cloud platforms such as AWS, Azure, or Google Cloud Platform (GCP).
- Strong hands-on expertise in Kubernetes and Infrastructure as Code tools like Terraform.
- Expertise in scripting and programming languages such as Python, Go, Bash, or JavaScript for automation and tooling.
- Deep understanding of Linux systems, networking, and distributed architectures.
- Experience with observability solutions such as Prometheus, Grafana, Datadog, CloudWatch, or New Relic.
- Familiarity with incident management and alerting platforms like PagerDuty or xMatters.
- Proficiency in CI/CD frameworks such as Jenkins, GitHub Actions, or GitLab CI.
- Working knowledge of security, compliance, and performance optimization in cloud-native environments.
Additional Information:
- The candidate should have minimum 3 years of experience in Site Reliability Engineering or related roles.
- This position is based at our Bengaluru office.
- A 15 years full time education is required.
- Relevant certifications such as AWS Certified Solutions Architect Professional, Microsoft Certified: Azure Solutions Architect Expert, Google Professional Cloud Architect, Certified Kubernetes Administrator (CKA), HashiCorp Certified: Terraform Associate, or DevOps Engineer certifications are preferred.
- The candidate needs to be AI Ready.