About the Role
We are looking for a Cloud Infrastructure & Agentic Architect (f/m/d) to own the technical foundation of our cloud platform. You will maintain our service catalog, design architectural blueprints, establish the observability standard, and — most distinctively — bring hands-on LLM-driven tooling skills to a team actively shaping what the next generation of cloud operations looks like.
This is a senior individual-contributor role reporting to the Infrastructure Lead, with direct CIO-adjacent visibility. You will sit at the intersection of platform engineering and technology strategy, contributing to platform roadmap decisions — not just executing them. The team is small, ownership is total, and iteration cycles are short.
Time allocation (approximate):
~40% LLM-driven tooling & workflow design | ~30% architecture & catalog governance | ~20%
observability | ~10% collaboration & documentation
Key Responsibilities
Service Catalog & Platform Governance
- Build and maintain a minimalistic, opinionated service catalog of approved cloud components across Google Cloud and Azure
- Apply a serverless-first, PaaS-first philosophy — challenge complexity and push back on
unnecessary infrastructure sprawl
- For every approved catalog entry, deliver production-ready configuration: IaC, security baseline,observability hooks, and runbook
- Scrutinise every new component request and justify its addition in terms of cost, operational overhead, and platform alignment
Cloud Architecture
- Define, document, and govern cloud architectural blueprints across networking, compute, storage and data layers
- Design event-driven pipelines that trigger AI-assisted validation, drift detection, and deployment gates
- Serve as the technical authority on platform design decisions within your domain
LLM-Driven Operations
- Implement LLM-based automated deployment capabilities using tools such as OpenCode, Claude Code, or equivalent frameworks
- Design and operate infrastructure workflows augmented by AI agents — from deployment validation to configuration drift detection
- Stay ahead of the market in AI-assisted tooling and bring relevant innovations into the platform
Observability
- Define the observability standard: structured logging, distributed tracing, alerting, dashboards, and SLO/SLA frameworks
- Establish platform-level KPIs and ensure consistent adoption across engineering teams
Collaboration
- Partner with DevOps and Engineering teams to embed platform standards into delivery pipelines
- Partner with SecOps to integrate all controls and security requirements
- Document everything: blueprints, ADRs, runbooks, onboarding guides