AWS Observability Engineer

Tata Consultancy Services -
Delhi

Apply Now

Job details

5 days ago

Qualifications

CI/CD
Azure
Node.js
Software troubleshooting
Kubernetes
DevOps
Git
AWS
Bachelor's degree
Terraform
Continuous integration
SDKs
GitHub
S3
Linux
Jenkins
GitLab
Identity & access management

Full job description

Role Overview

We are looking for a highly motivated Observability Engineer to design, implement, and operate end to end observability solutions for modern, cloud native platforms.

The role focuses on building and maintaining metrics, logs, and tracing (MELT) pipelines using industry standard tools and ensuring high system reliability, performance, and visibility.

You will work closely with SRE, DevOps, Platform, and Application teams to improve system monitoring, troubleshoot production issues, and drive a culture of operational excellence.

________________________________________

Key Responsibilities

Observability Platform Engineering

Design, implement, and maintain observability platforms using OpenTelemetry, Prometheus, Grafana, Loki, and Tempo

Build scalable pipelines for metrics, logs, and distributed traces

Define and enforce observability standards across teams

Monitoring & Alerting

Create and maintain SLOs, SLIs, and alerting strategies

Design actionable alerts that reduce noise and prevent alert fatigue

Configure dashboards, alerts, and runbooks for production systems

Kubernetes & Cloud Observability

Implement observability for Kubernetes (EKS/GKE/AKS) workloads

Enable pod level, node level, and cluster level visibility

Integrate observability with cloud services (AWS/GCP/Azure)

Incident Response & Troubleshooting

Support production incident investigations using logs, metrics, and traces

Perform root cause analysis (RCA) and post incident reviews

Improve MTTR by enhancing observability coverage

Automation & Optimization

Automate observability deployment using Helm, Terraform, or GitOps

Optimize cost and performance of telemetry pipelines

Improve data retention, sampling, and aggregation strategies

Collaboration & Enablement

Partner with development teams to onboard applications to observability

Provide guidance on instrumentation best practices

Document observability architectures and operational playbooks

________________________________________

Required Skills

Core Technical Skills

Strong understanding of observability concepts (metrics, logs, traces)

Hands on experience with:

OpenTelemetry (SDKs, Agents, Gateways)

Prometheus (scraping, recording rules, alerts)

Grafana (dashboards, alerts, correlations)

Loki or other log aggregation systems

Tempo / Jaeger for distributed tracing

Cloud & Platform

Experience with Kubernetes

Experience running workloads on AWS (preferred) or other clouds

Familiarity with cloud services (EKS, EC2, IAM, S3, Load Balancers)

DevOps & SRE Tooling

CI/CD pipelines (GitHub Actions, Jenkins, GitLab )

Infrastructure as Code (Terraform / CloudFormation)

Linux and networking fundamentals

Location

Delhi

Job Function

TECHNOLOGY

Role

Developer

Job Id

416121

Desired Skills

Desired Candidate Profile

Qualifications : BACHELOR OF TECHNOLOGY

Apply Now

Jobseeker tools

Employer Tools

Browse

Stay Connected