Job Title:
SRE DevOps Engineer
Job Location:
Bengaluru , Hyderabad & Chennai
Key Responsibilities :
Infrastructure Automation (Python-driven)
- Develop and maintain Python scripts/tools to automate
- Provisioning (VMs, containers, cloud resources)
- Configuration management
- System health checks and maintenance tasks
- Build reusable automation frameworks and APIs
- Reduce manual ops work through end-to-end automation pipelines
- Create internal tools (Python-based) for dev productivity
CI/CD Pipeline Engineering
- End-to-end ownership of CI/CD pipelines, deployments, and production releases
- Automate , Build, test, deployment workflows
- Integrate Python automation for: Test orchestration & Deployment validations
Site Reliability Engineering (SRE) Practices
- Define and manage:SLIs, SLOs, SLAs
- Improve system reliability, scalability, and uptime
- Perform: Root Cause Analysis (RCA) &Incident management & postmortems
Monitoring, Logging & Observability
- Implement monitoring solutions: Prometheus, Grafana, ELK, Datadog, Splunk
- Use Python for: Custom metrics collection , Log parsing and analytics & Alert automation
- Support of ML platform operations and Kubernetes ecosystem
Cloud & Infrastructure Management
- Manage cloud platforms (AWS / Azure / GCP): Compute, storage, networking, serverless
- Implement Infrastructure as Code (IaC):
- Terraform, CloudFormation, ARM templates
Experience & Mandatory skills:
Overall 5 to 8 yrs exp with strong hold on Python Automation
- Cloud: Microsoft Azure , IaC: Terraform
- CI/CD: GitHub, GitHub Actions, Octopus Deploy
- Containers & Orchestration: Kubernetes, AKS
- MLOps: Kubeflow, KServe, Istio, EvidenceAI
- Monitoring: ELK Stack, Prometheus, Grafana
- Web/Hosting: IIS (Windows Server)
- Database: SQL Server
- Scripting: Python, Bash, PowerShell
- End-to-end ownership of CI/CD pipelines, deployments, and production releases
- Ownership of monitoring, alerting, and platform reliability
- Responsibility for cloud infrastructure lifecycle management
- Active contribution to platform modernization and migration initiatives
- Support of ML platform operations and Kubernetes ecosystem
- Direct involvement in customer onboarding and production support