We are looking for a Site Reliability Engineer with hands-on experience in Kubernetes, multi-cloud deployments, CI/CD automation, and cloud cost optimization. This role is ideal for someone who enjoys solving complex infrastructure challenges, strengthening system reliability, and scaling cloud-native applications. The ideal candidate combines strong engineering fundamentals with a passion for automation, performance, and operational excellence.
Responsibilities
- Manage, optimize, and troubleshoot Kubernetes clusters (K3s, EKS, AKS) in production.
- Ensure performance, reliability, and scalability of cloud-native applications
- Deploy and manage applications across multi-cloud environments (AWS, Azure).
- Design and maintain CI/CD pipelines to support automated deployments and testing.
- Implement cloud cost optimization strategies using AWS Reserved Instances and Savings Plans.
- Analyze cloud usage patterns across services like S3, ECS, KMS, ECR, CloudFront, and ELB.
- Manage SSL certificates using OpenSSL and AWS Certificate Manager (ACM).
- Configure and administer Kong API Gateway for API routing, security, and traffic management.
- Manage SSO integrations using SAML and Azure AD to ensure secure and seamless access.
Required Skills & Qualifications
- 3–4 years of experience as an SRE or in a similar infrastructure/DevOps role.
- Strong hands-on experience with Kubernetes (K3s, EKS, AKS), Helm, and kubectl.
- Solid understanding of AWS services, including CloudFront, ELB, ECS, and related components.
- Experience with CI/CD tools such as GitLab CI or Bitbucket Pipelines.
- Proven expertise in AWS cost management, including Reserved Instances and Savings Plans.
- Proficiency in SSL certificate management using OpenSSL and AWS ACM.
- Experience configuring and managing Kong API Gateway.
- Strong troubleshooting, analytical thinking, and problem-solving skills.