EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
We are seeking an experienced Azure Site Reliability Engineer (SRE) to design, implement, and maintain highly available, scalable, and secure cloud infrastructure on Microsoft Azure. The ideal candidate will have strong expertise in cloud operations, automation, monitoring, incident management, and DevOps practices.
Responsibilities
-
Manage and support Azure cloud infrastructure and services
-
Ensure platform reliability, availability, performance, and scalability
-
Implement Infrastructure as Code (IaC) using Terraform, Bicep, or ARM templates
-
Automate operational tasks using PowerShell, Python, Bash, or Azure CLI
-
Configure and manage CI/CD pipelines using Azure DevOps, GitHub Actions, or Jenkins
-
Monitor applications and infrastructure using Azure Monitor, Log Analytics, Application Insights, Grafana, and Prometheus
-
Troubleshoot production issues and perform root cause analysis (RCA)
-
Implement backup, disaster recovery, and business continuity solutions
-
Collaborate with development and operations teams to improve system reliability
-
Participate in on-call support and incident management activities
Requirements
-
4+ years of experience in Site Reliability Engineering or a related field
-
Strong experience with Microsoft Azure services, including Azure Virtual Machines, Azure Kubernetes Service (AKS), and Azure App Services
-
Expertise in Azure Storage, Azure Networking (VNet, NSG, Load Balancer, Application Gateway), and Azure Entra ID (Azure AD)
-
Background in Kubernetes and containerization technologies such as Docker and AKS
-
Hands-on proficiency in Terraform, ARM Templates, or Bicep
-
Strong knowledge of Linux and/or Windows administration
-
Familiarity with CI/CD tools such as Azure DevOps, GitHub Actions, or Jenkins
-
Competency in monitoring, logging, and observability tools
-
Understanding of security best practices and cloud governance
-
Scripting skills in PowerShell, Python, or Bash
-
Proficient communication skills in English (B2 level or higher)
Nice to have
-
Experience with SRE principles and SLI/SLO/SLA implementation
-
Knowledge of Chaos Engineering and Reliability Engineering practices
-
Microsoft Azure certifications, such as Azure Administrator Associate, Azure DevOps Engineer Expert, or Azure Solutions Architect Expert
-
Familiarity with ServiceNow, ITIL processes, and cloud cost optimization
We offer
-
Opportunity to work on technical challenges that may impact across geographies
-
Vast opportunities for self-development: online university, knowledge sharing opportunities globally, learning opportunities through external certifications
-
Opportunity to share your ideas on international platforms
-
Sponsored Tech Talks & Hackathons
-
Unlimited access to LinkedIn learning solutions
-
Possibility to relocate to any EPAM office for short and long-term projects
-
Focused individual development
-
Benefit package:
-
Health benefits
-
Retirement benefits
-
Paid time off
-
Flexible benefits
-
Forums to explore beyond work passion (CSR, photography, painting, sports, etc.)