Lead Site Reliability Engineer

Luxoft -
Bengaluru, Karnataka

Apply Now

Job details

8 days ago

Qualifications

CI/CD
Cloud infrastructure
Cassandra
Incident management
PCI
Kubernetes
DevOps
English
Build automation
Microservices
AWS
PostgreSQL
Distributed systems
Terraform
Continuous integration
APIs
ISO 27001
Kafka
Jenkins
Identity & access management

Full job description

Project description

Luxoft partner with next-generation digital bank, built from the ground up to deliver seamless, secure, and scalable financial services. Our platform is cloud-native, API-first, and focused on reliability, speed, and security. We are growing fast and looking for top-tier Site Reliability / Ops Engineers to join our core team and help run and scale our infrastructure.
As a Site Reliability Engineer, you will be responsible for maintaining and scaling our core infrastructure, ensuring our banking services remain available, secure, and performant. You will work closely with development, product, and security teams to automate operations, manage cloud infrastructure, and uphold high availability standards.

Responsibilities

Ownership

Lead the design, operation, and continuous improvement of cloud infrastructure, Kubernetes platforms, and reliability practices across production environments.

Direct and develop a team of 3-5 engineers, combining mentoring with clear delivery ownership, coaching, and performance leadership.

Establish and drive standards for observability, deployment safety, incident management, self-service platform capabilities, and reusable golden-path engineering practices.

Build automation across infrastructure provisioning, CI/CD workflows, and operational processes to improve consistency, resilience, and delivery efficiency.

Collaboration

Partner with engineering, product, platform, and security teams to improve reliability, scalability, and secure-by-default operations.

Align stakeholders on platform standards, operational readiness, and adoption of engineering practices, using strong documentation and influence rather than relying only on formal authority.

Provide clarity and direction in complex environments by balancing delivery needs, team development, and cross-functional priorities.

Solutioning

Solve complex reliability and infrastructure problems by balancing availability, security, performance, cost, and delivery speed.

Guide technical decisions across AWS, multi-cluster Kubernetes, blue-green deployments, service mesh, and distributed production systems.

Define and operationalize SLOs, SLIs, error budgets, monitoring, alerting, and post-incident improvement practices.

Support resilient production systems through strong debugging, fault-tolerant design, and practical security and compliance controls.

Skills

Must have

12+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, Cloud Infrastructure, or related production engineering roles.

2+ years operating at Staff Engineer, Lead Engineer, or equivalent senior technical level.

2+ years supporting production-grade microservices environments at scale.

Strong hands-on expertise with AWS, Kubernetes, multi-cluster operations, Terraform, Helm, kubectl, CI/CD, and tools such as Jenkins.

Strong experience with observability and incident management tooling such as Prometheus, Grafana, and OpenSearch.

Experience building self-service platform capabilities, reusable platform standards, and scalable operational practices.

Strong understanding of Zero Trust architecture, OAuth2, ZTNA, IAM, secrets management, certificates, and access controls.

Experience working in regulated or high-control environments with standards such as PCI DSS, ISO 27001, and MAS TRM.

Experience supporting distributed systems and data platforms, including microservices reliability, PostgreSQL, Kafka, Cassandra, and fault-tolerant architectures.

Strong leadership, decision-making, stakeholder influence, and technical documentation skills.

Success KPIs

Production platforms meet agreed reliability, availability, and recovery targets.

Deployment and operational workflows become more automated, repeatable, and low risk.

Platform standards and self-service practices are adopted across teams.

Recurring incidents and operational toil are reduced through better engineering design and automation.

Team capability, ownership, and execution quality improve through effective people leadership.

The role delivers visible business and organizational impact, not only technical delivery.

Nice to have

Other

Languages

English: C2 Proficient

Seniority

Lead

Bengaluru, India

Req. VR-123080

DevOps

BCM Industry

02/06/2026

Req. VR-123080

Apply Now

Project description

Responsibilities

Skills

Other

Jobseeker tools

Employer Tools

Browse

Stay Connected