EPAM is a leading global provider of digital platform engineering and development services. We are committed to having a positive impact on our customers, our employees, and our communities. We embrace a dynamic and inclusive culture. Here you will collaborate with multi-national teams, contribute to a myriad of innovative projects that deliver the most creative and cutting-edge solutions, and have an opportunity to continuously learn and grow. No matter where you are located, you will join a dedicated, creative, and diverse community that will help you discover your fullest potential.
We are seeking a Senior GCP SRE - ELK to join a B2B parts business undergoing commerce and digital transformation. In this client-facing role, you will support the platform COE within a DevOps/SRE function, working with teams across both the Americas and Belgium to build and maintain scalable, secure and reliable cloud infrastructure.
Responsibilities
-
Design, implement and manage scalable and secure cloud infrastructure on Google Cloud Platform (GCP)
-
Optimization of GCP resources for performance, cost and reliability
-
Monitor and troubleshoot GCP services to ensure high availability and performance
-
Deploy, manage and scale containerized applications using Google Kubernetes Engine (GKE)
-
Implementation and maintenance of Kubernetes clusters, ensuring proper configuration and security
-
Utilize Helm for package management and deployment of applications on Kubernetes
-
Develop and maintain CI/CD pipelines using tools such as Jenkins, ArgoCD and GitLab to automate application deployment and infrastructure provisioning
-
Collaborate with development teams to integrate CI/CD practices into the software development lifecycle
-
Implementation of monitoring solutions using Prometheus, Grafana and ElasticSearch to track application performance and system health
-
Set up logging and tracing mechanisms to facilitate troubleshooting and performance optimization
-
Administer and optimize Confluent Kafka clusters deployed across both SaaS and on-premises environments to ensure high availability, fault tolerance and continuous data streaming
-
Development and maintenance of automation scripts and tools for deployment, scaling and monitoring of Kafka services
-
Investigate and resolve operational incidents, minimizing downtime and service disruption
-
Collaborate with cross-functional teams to gather requirements and provide technical guidance on cloud and containerization best practices
-
Documentation of architecture, processes and procedures to ensure knowledge sharing and compliance with best practices
Requirements
-
4-10 years of experience in a DevOps or SRE role
-
Production expertise in Google Cloud Platform (GCP)
-
Proficiency in Kubernetes (GKE, Helm)
-
Background in managing Confluent Kafka Platform across SaaS and inhouse instances
-
Competency in CI/CD tools (Jenkins, ArgoCD)
-
Skills in logging, metrics and tracing (Prometheus, ElasticSearch, Grafana)
-
Familiarity with Kibana
-
Very good English level for a client-facing role with stakeholders in both Americas and Belgium
We offer
-
Opportunity to work on technical challenges that may impact across geographies
-
Vast opportunities for self-development: online university, knowledge sharing opportunities globally, learning opportunities through external certifications
-
Opportunity to share your ideas on international platforms
-
Sponsored Tech Talks & Hackathons
-
Unlimited access to LinkedIn learning solutions
-
Possibility to relocate to any EPAM office for short and long-term projects
-
Focused individual development
-
Benefit package:
-
Health benefits
-
Retirement benefits
-
Paid time off
-
Flexible benefits
-
Forums to explore beyond work passion (CSR, photography, painting, sports, etc.)