Overview:
We are looking for a Principal Cloud Architect who operates at the intersection of deep infrastructure engineering, platform reliability, and strategic solution design. This is a high-impact senior individual contributor position ; you will be the organisation's foremost expert in diagnosing and resolving complex infrastructure incidents, designing cloud modernisation blueprints, and continuously raising the engineering bar. You will architect across AWS and Azure at an expert level, champion DevOps and SRE culture, lead cloud-native platform decisions, and serve as a technical thought leader on emerging technologies including AI-driven infrastructure and FinOps practices. This role is both hands-on and strategic. You are expected to write code, build prototypes, own architectural artefacts, and actively mentor senior engineers ; while also influencing technology roadmaps and cross-functional engineering decisions at a principal level.
Responsibilities:
1) Troubleshooting & reliability
-
Own resolution of critical infra incidents across AWS & Azure
-
Lead RCAs and produce actionable post-mortems
-
Define and enforce SLOs, SLIs, and error budgets
-
Build runbooks, playbooks, and on-call frameworks
2) Cloud architecture
- Design scalable, secure architectures for cloud workloads
- Architect hybrid and multi-cloud connectivity models
- Create reference architectures and golden paths
- Lead architectural reviews and produce ADRs
3) Infra modernisation
- Drive migration from legacy to cloud-native systems
- Champion IaC adoption at scale (Terraform / Bicep)
- Mature Kubernetes platform across EKS and AKS
- Lead FinOps and cloud cost optimisation initiatives
4) DevOps, observability & AI
- Define CI/CD, GitOps, and developer platform standards
- Drive observability using Grafana, Prometheus, OpenTelemetry
- Architect AI/ML-ready infra and integrate AIOps tooling
- Mentor engineers and influence the technology roadmap
Qualifications:
Must have
Expert in AWS and Azure architecture, networking, security
Deep Kubernetes knowledge (EKS, AKS, RBAC, service mesh)
Strong cloud networking (VPC/VNet, BGP, Private Link, ZTA)
IaC at scale : Terraform, Pulumi, or CloudFormation/Bicep
SRE practices : SLO/SLI, error budgets, chaos engineering
Observability stack : Grafana, Prometheus, OpenTelemetry
Scripting in Python and Shell/Bash
Config management with Ansible (AWX/Tower)
Good to have
AI/ML infra
AIOps
FinOps tools
Databricks / Kafka
Go / TypeScript
Edge computing
Experience
11+ years in infra / cloud engineering (8+ in architecture)
Led modernisation programmes end-to-end
Owned P0/P1 incident resolution at scale
Degree in CS/IT or equivalent practical experience
Preferred certifications
AWS SA - Pro
AZ-305
CKA / CKS
Terraform Associate
AI-102
FinOps CP