JOB TITLE :: Senior Software Engineer - AI Platform Engineer
JOB TYPE :: Full-time
JOB LOCATION :: India
About the Role
We’re embedding AI across the entire software development lifecycle to reduce the time it takes to understand, triage, and resolve problems - from planning and code, through CI/CD, to production systems.
The AI Foundation team builds the platform infrastructure that makes AI-assisted software delivery possible inside CloudBees (and our CloudBees Unify product). Our systems ingest signals from pipelines, pull requests, source-code, tests, and incidents - and need to process them reliably, at scale, in production.
This is a platform engineering role. Your primary identity here is as someone who writes production-quality Go, operates confidently in Kubernetes-native environments, and takes observability seriously as a design constraint, not an afterthought.
You’ll work in an environment where AI tooling is part of how the team operates - from development through to the systems we ship. That means being an effective conductor: knowing how to direct AI tooling, decompose problems for it, and apply the same critical judgement to its output that you would to any other contributor’s work.
What You’ll Do
-
Design, build, and operate Go services running on Kubernetes that form the backbone of our agent infrastructure
-
Own reliability for production systems - define SLOs, write runbooks, and be accountable when things go wrong
-
Instrument everything: structured logging, distributed tracing (OpenTelemetry), and metrics that surface both system health and the behaviour of the AI workflows running on top
-
Contribute to Kubernetes-native patterns: operators, CRDs, workload isolation, and resource management for dynamic agent workloads
-
Build and maintain CI/CD pipelines that support rapid, safe iteration - using AI tooling as a natural part of that workflow
-
Use AI development tooling effectively - generating, reviewing, and steering code with the same rigour you’d apply to any PR
-
Decompose and delegate work to AI agents where appropriate, and know when not to - catching drift, validating output, and maintaining ownership of outcomes
-
Review AI-generated code with an understanding of its specific failure modes: plausible-looking but subtly wrong logic, missing edge cases, and over-confident implementations
-
Contribute to the team’s shared judgement on when new models, frameworks, or tooling change the architecture assumptions we’re building on
-
Partner with the broader AI Foundation team to establish platform patterns that other teams adopt
-
Participate in architecture reviews and contribute to build vs buy decisions
-
Mentor more junior engineers and raise the team’s bar for production readiness and operational discipline
-
You'll work closely with ML engineers and agentic systems engineers on the team, providing the platform foundation - reliability, observability, deployment infrastructure - that lets their work reach production safely.
-
This is a genuine collaboration: you'll bring rigour to how their systems are built and operated, and they'll bring you along on the AI domain specifics - model behaviour, context design, evaluation approaches - as the work demands.
-
You won't be expected to arrive knowing that space deeply; you will be expected to engage with it seriously.
What We’re Looking For
-
5+ years of professional software engineering experience, with meaningful time operating at senior level
-
Strong Go - idiomatic, well-tested, production-shipped (if you meet all the other requirements, but have a different language - please apply)
-
Solid Kubernetes operations experience: you’ve debugged real production incidents, understand the scheduler, and know when a CRD is the wrong answer
-
Hands-on observability experience - OpenTelemetry, Prometheus, distributed tracing - and a genuine conviction that unobservable systems are untrustworthy
-
Cloud-native background on AWS or GCP: IAM, managed Kubernetes, infrastructure-as-code
-
Comfortable working in an AI-augmented development environment - directing tools, critically evaluating their output, and maintaining engineering rigour throughout
-
Demonstrated ability to own production systems end-to-end: design, ship, monitor, iterate
-
Familiarity with Tekton, Jenkins, Argo Workflows, or similar pipeline infrastructure
-
Exposure to how LLM-based systems are structured - enough to reason about them as infrastructure dependencies, not black boxes
-
Experience contributing to platform or shared infrastructure used by multiple teams
What Success Looks Like
-
Shipping production code to the AI Foundation platform with minimal ramp-up
-
Owning at least one service end-to-end: SLOs, alerts, and runbooks included
-
Identifying and addressing a meaningful observability gap in an existing workflow
-
Delivering a materially improved platform capability that the team relies on
-
Contributing patterns or tooling that other engineers adopt
-
Being the person teammates reach for on K8s and observability questions
-
Leading a significant platform initiative from design through to production reliability
-
Influencing how the team approaches build vs buy vs integrate decisions
-
Raising the bar on how the team uses AI tooling in its own engineering practice
Why This Role
Platform engineering here isn’t maintenance work. The systems are genuinely hard - dynamic workloads, heterogeneous signal sources, reliability requirements that don’t bend because the underlying models are probabilistic.
You’ll also be engineering in the way most teams will be working in a few years: AI tooling as a normal part of the workflow, with engineers who know how to get the most out of it without losing ownership of what ships.