Site Reliability Engineer (SRE)

Techdome
Hyderabad, Telangana

Quick apply

Job details

2 days ago

Qualifications

CI/CD
Cloud infrastructure
Azure
Go
Kubernetes
Ansible
Software deployment
DevOps
Git
Bash (Unix shell)
AWS
Docker
Distributed systems
Terraform
Continuous integration
Scripting
GitHub
Linux
AI
Jenkins
GitLab
Python

Full job description

Site Reliability Engineer (SRE) | DevOps Engineer

Location: Hyderabad / Indore

Experience: 3–6 Years

About the Role

Techdome is hiring a Site Reliability Engineer (SRE) to build, operate, and continuously improve highly available, secure, and scalable cloud infrastructure across our healthcare, fintech, AI, and SaaS products.

This role goes beyond traditional DevOps. You'll own production environments, improve system reliability, automate operations, build resilient deployment pipelines, manage incidents, and ensure seamless releases using strategies such as Blue-Green Deployments, Rolling Deployments, and Zero-Downtime Releases.

If you're passionate about automation, cloud infrastructure, Kubernetes, observability, and AI-powered operations, we'd love to hear from you.

Key Responsibilities

Maintain the availability, reliability, scalability, and performance of production systems.
Manage and optimize production environments across cloud platforms.
Design and automate deployment pipelines using CI/CD best practices.
Implement Blue-Green, Rolling, and Zero-Downtime deployment strategies.
Build Infrastructure as Code using Terraform and Ansible.
Implement observability using Prometheus, Grafana, ELK, Datadog, OpenTelemetry, and centralized logging.
Define and maintain SLIs, SLOs, and Error Budgets.
Lead production incident management, Root Cause Analysis (RCA), and post-incident reviews.
Perform cloud cost optimization and capacity planning.
Automate operational workflows using scripting and AI-powered tooling.
Participate in on-call rotations and production support.

Required Skills

3+ years of experience as a Site Reliability Engineer, DevOps Engineer, Platform Engineer, or Cloud Engineer.
Hands-on experience with AWS, Azure, or GCP.
Strong experience with Docker and Kubernetes.
Expertise in Terraform, Ansible, or other Infrastructure as Code tools.
Experience building CI/CD pipelines using Jenkins, GitHub Actions, GitLab CI, or similar.
Strong Linux administration, networking, and distributed systems knowledge.
Programming or scripting experience in Python, Go, or Bash.
Experience managing large-scale production environments.
Understanding of deployment strategies including Blue-Green, Canary, and Rolling Deployments.

Preferred Skills

Experience with AI tools such as GitHub Copilot, Claude, Cursor, ChatGPT, or similar developer productivity tools.
Experience building AI-powered operational workflows for monitoring, alert triage, incident summarization, or automation.
Experience in FinTech, Payments, Healthcare, or other high-availability environments.
Knowledge of SRE principles including SLOs, SLIs, Error Budgets, Chaos Engineering, and Reliability Engineering.

Why Join Techdome :

Work on real-world AI, Healthcare, Payments, and SaaS products.
Own critical production infrastructure from Day 1.
Build systems that support thousands of users and business-critical workflows.
Collaborate directly with founders and senior engineering leadership.
AI-first engineering culture with modern tooling and automation.
Fast decision-making, genuine ownership, and accelerated career growth.

Our Hiring Process — As Fast As We Are

We value your time and move quickly.

Step 1: AI Interview on our in-house platform, JustInterview.ai (JIA)

Step 2: Technical Discussion & 1:1 with Leadership

Meet the decision-makers, discuss real engineering challenges, and if it's the right fit you'll have your decision quickly.

Quick apply

Site Reliability Engineer (SRE) | DevOps Engineer

About the Role

Key Responsibilities

Required Skills

Preferred Skills

Why Join Techdome :

Our Hiring Process — As Fast As We Are

Jobseeker tools

Employer Tools

Browse

Stay Connected