We are looking for a highly skilled Kubernetes Platform Engineer with 5+ years of hands-on experience in designing, implementing, and managing Kubernetes-based environments on cloud platforms — specifically Amazon EKS and Google Kubernetes Engine (GKE).
The ideal candidate will build and operate scalable, secure, and highly resilient Kubernetes platforms supporting large-scale microservices architectures. This role requires close collaboration with DevOps, Development, Security, and SRE teams to ensure high availability, performance, and operational excellence.
The candidate should be adaptable, open to handling diverse cloud and DevOps initiatives, and willing to work in rotational shifts (morning/day/evening/night).
Key Responsibilities
Kubernetes Platform Engineering
-
Design, deploy, and manage Kubernetes clusters on AWS and GCP
-
Set up and manage:
-
EKS and/or GKE clusters
-
Node groups and autoscaling policies
-
Cluster networking and ingress controllers
-
Implement namespace segregation and resource quotas
-
Manage cluster upgrades, patching, and lifecycle management
Infrastructure Provisioning & Automation
-
Provision infrastructure using Infrastructure-as-Code tools:
-
Terraform (preferred)
-
CloudFormation
-
Automate cluster provisioning and environment setup
-
Implement automated scaling using:
-
Cluster Autoscaler
-
Horizontal Pod Autoscaler
-
Vertical Pod Autoscaler (preferred)
Cloud Platform Management (AWS & GCP)
Hands-on experience managing:
AWS:
-
EC2, EKS, VPC, IAM, ELB (ALB/NLB), EBS, CloudWatch
GCP:
-
GKE, Compute Engine, VPC, IAM, Load Balancing
Responsibilities include:
-
VPCs, subnets, route tables
-
Load balancers (ALB, NLB, GCP Load Balancer)
-
Security groups and firewall rules
-
IAM roles, service accounts, and access policies
Microservices Platform Operations
-
Support large-scale microservices deployments
-
Optimize resource utilization and cluster performance
-
Troubleshoot pod, node, networking, and application issues
-
Manage rolling deployments and zero-downtime upgrades
Observability & Monitoring
-
Implement and manage monitoring tools:
-
Prometheus
-
Grafana
-
GCP Operations Suite
-
Configure dashboards, alerts, and incident monitoring
-
Perform root cause analysis and incident resolution
Security & Compliance
-
Implement Kubernetes security best practices:
-
RBAC
-
Network policies
-
Pod security standards
-
Configure secrets management:
-
AWS Secrets Manager
-
GCP Secret Manager
-
Ensure compliance with enterprise security standards
CI/CD & DevOps Integration
-
Integrate Kubernetes with CI/CD pipelines:
-
Jenkins
-
GitLab CI / GitHub Actions
-
Support container build and deployment workflows
-
Manage container registries (ECR, GCR, Artifact Registry)
-
Work with Docker and Helm
Disaster Recovery & High Availability
-
Design and implement HA Kubernetes architectures
-
Implement backup and recovery strategies
-
Participate in DR drills and recovery validation
Required Technical Skills
Kubernetes & Containerization
-
5+ years hands-on Kubernetes experience
-
Strong experience with EKS and/or GKE
-
Experience managing production clusters
-
Strong knowledge of:
-
Pods, Deployments, StatefulSets
-
Services, Ingress
-
ConfigMaps, Secrets
Infrastructure as Code
-
Terraform (preferred)
-
CloudFormation
Monitoring & Logging
-
Prometheus
-
Grafana
-
ELK Stack or cloud-native logging
Preferred Skills
-
Experience supporting 100+ microservices environments
-
Experience with Service Mesh (Istio, Linkerd)
-
Experience with Kafka, Redis, or Solr environments
-
Multi-environment setup (Dev, QA, Prod)
-
Production incident handling (SRE practices)
Soft Skills
-
Strong troubleshooting and analytical skills
-
Ability to work in production-critical environments
-
Strong documentation capability
-
Good communication and collaboration skills
-
Flexible and adaptable mindset
Preferred Certifications
One or more of:
-
Certified Kubernetes Administrator (CKA)
-
AWS Certified Solutions Architect
-
AWS Certified DevOps Engineer
-
Google Professional Cloud DevOps Engineer
First 90 Days – Success Indicators
-
Set up and manage Kubernetes environments
-
Improve cluster reliability and scalability
-
Implement monitoring and alerting frameworks
-
Support application deployments
-
Optimize infrastructure performance