SRE - AWS/Azure

TECH NEXT
Remote

Quick apply

Job details

Full-time | Contractual / Temporary
₹70,000 - ₹95,000 a month
1 day ago

Qualifications

CI/CD
Cloud infrastructure
Azure
Go
Incident management
Kubernetes
DevOps
AWS
Analysis skills
Docker
Continuous improvement
Terraform
Continuous integration
Scripting
Scalability
Linux
Root cause analysis
Communication skills
Python

Full job description

Site Reliability Engineer (SRE) – 8 Years Experience

Location: Remote

Employment Type: Contract / Full-Time

Immediate Joiners Preferred

About the Role

We are looking for a highly skilled Site Reliability Engineer (SRE) with 8+ years of experience to enhance the reliability, scalability, performance, and operational excellence of our mission-critical platforms. The ideal candidate will have strong expertise in cloud infrastructure, automation, observability, incident management, and platform engineering.

This role offers the opportunity to work on large-scale production environments, drive automation initiatives, and partner closely with engineering teams to build resilient and highly available systems.

Key Responsibilities

Define, measure, and manage Service Level Indicators (SLIs), Service Level Objectives (SLOs), and Error Budgets to improve system reliability.

Lead production incident response, troubleshooting, root cause analysis, and post-incident reviews.

Design and implement automation solutions to reduce operational overhead and improve engineering efficiency.

Build and enhance observability frameworks, including monitoring, logging, alerting, and distributed tracing.

Drive performance tuning, capacity planning, resilience engineering, and reliability initiatives.

Implement Infrastructure as Code (IaC) and deployment automation using modern DevOps practices.

Collaborate with development and platform teams to improve production readiness, scalability, and availability.

Support and optimize cloud-native architectures and containerized environments.

Required Qualifications

8+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, or related roles.

Strong programming and scripting skills in Python and/or Go.

Advanced Linux administration and troubleshooting expertise.

Hands-on experience with Kubernetes and Docker.

Strong knowledge of Terraform and Infrastructure as Code (IaC) practices.

Experience working with cloud platforms such as AWS, Azure, or GCP.

Expertise in observability tools including Prometheus, Grafana, ELK Stack, and OpenTelemetry.

Experience designing and maintaining CI/CD pipelines and automation frameworks.

Strong background in incident management, root cause analysis, and production support.

Preferred Qualifications

Kubernetes and/or Cloud Platform Certifications.

Experience with Chaos Engineering and resilience testing.

Platform Engineering experience.

Experience supporting large-scale, high-availability production environments.

What We're Looking For

Strong analytical and problem-solving skills.

Experience managing critical production systems and participating in on-call rotations.

Excellent communication and stakeholder management abilities.

Passion for automation, reliability, operational excellence, and continuous improvement.

Experience:

Site Reliability Engineer (SRE): 8 years (Required)

Work Location: Remote

Pay: ₹70,000.00 - ₹95,000.00 per month

Work Location: Remote

Quick apply

Jobseeker tools

Employer Tools

Browse

Stay Connected