12+ years of experience in Site Reliability Engineering, DevOps, Platform Engineering, Cloud Infrastructure, or related production engineering roles.
2+ years operating at Staff Engineer, Lead Engineer, or equivalent senior technical level.
2+ years supporting production-grade microservices environments at scale.
Strong hands-on expertise with AWS, Kubernetes, multi-cluster operations, Terraform, Helm, kubectl, CI/CD, and tools such as Jenkins.
Strong experience with observability and incident management tooling such as Prometheus, Grafana, and OpenSearch.
Experience building self-service platform capabilities, reusable platform standards, and scalable operational practices.
Strong understanding of Zero Trust architecture, OAuth2, ZTNA, IAM, secrets management, certificates, and access controls.
Experience working in regulated or high-control environments with standards such as PCI DSS, ISO 27001, and MAS TRM.
Experience supporting distributed systems and data platforms, including microservices reliability, PostgreSQL, Kafka, Cassandra, and fault-tolerant architectures.
Strong leadership, decision-making, stakeholder influence, and technical documentation skills.
Production platforms meet agreed reliability, availability, and recovery targets.
Deployment and operational workflows become more automated, repeatable, and low risk.
Platform standards and self-service practices are adopted across teams.
Recurring incidents and operational toil are reduced through better engineering design and automation.
Team capability, ownership, and execution quality improve through effective people leadership.
The role delivers visible business and organizational impact, not only technical delivery.