Senior Software Engineer - AI Platform Engineer

CloudBees -
Indi, Karnataka

Apply Now

Job details

Full-time
3 days ago

Qualifications

CI/CD
Go
Law
Kubernetes
AWS
Continuous integration
AI
Jenkins
Identity & access management
SDLC

Full job description

JOB TITLE :: Senior Software Engineer - AI Platform Engineer

JOB TYPE :: Full-time

JOB LOCATION :: India

About the Role

We’re embedding AI across the entire software development lifecycle to reduce the time it takes to understand, triage, and resolve problems - from planning and code, through CI/CD, to production systems.

The AI Foundation team builds the platform infrastructure that makes AI-assisted software delivery possible inside CloudBees (and our CloudBees Unify product). Our systems ingest signals from pipelines, pull requests, source-code, tests, and incidents - and need to process them reliably, at scale, in production.

This is a platform engineering role. Your primary identity here is as someone who writes production-quality Go, operates confidently in Kubernetes-native environments, and takes observability seriously as a design constraint, not an afterthought.

You’ll work in an environment where AI tooling is part of how the team operates - from development through to the systems we ship. That means being an effective conductor: knowing how to direct AI tooling, decompose problems for it, and apply the same critical judgement to its output that you would to any other contributor’s work.

What You’ll Do

Platform Engineering

Design, build, and operate Go services running on Kubernetes that form the backbone of our agent infrastructure
Own reliability for production systems - define SLOs, write runbooks, and be accountable when things go wrong
Instrument everything: structured logging, distributed tracing (OpenTelemetry), and metrics that surface both system health and the behaviour of the AI workflows running on top
Contribute to Kubernetes-native patterns: operators, CRDs, workload isolation, and resource management for dynamic agent workloads
Build and maintain CI/CD pipelines that support rapid, safe iteration - using AI tooling as a natural part of that workflow

Engineering in an AI-Native Environment

Use AI development tooling effectively - generating, reviewing, and steering code with the same rigour you’d apply to any PR
Decompose and delegate work to AI agents where appropriate, and know when not to - catching drift, validating output, and maintaining ownership of outcomes
Review AI-generated code with an understanding of its specific failure modes: plausible-looking but subtly wrong logic, missing edge cases, and over-confident implementations
Contribute to the team’s shared judgement on when new models, frameworks, or tooling change the architecture assumptions we’re building on

Technical Collaboration

Partner with the broader AI Foundation team to establish platform patterns that other teams adopt
Participate in architecture reviews and contribute to build vs buy decisions
Mentor more junior engineers and raise the team’s bar for production readiness and operational discipline

Working Across Disciplines

You'll work closely with ML engineers and agentic systems engineers on the team, providing the platform foundation - reliability, observability, deployment infrastructure - that lets their work reach production safely.
This is a genuine collaboration: you'll bring rigour to how their systems are built and operated, and they'll bring you along on the AI domain specifics - model behaviour, context design, evaluation approaches - as the work demands.
You won't be expected to arrive knowing that space deeply; you will be expected to engage with it seriously.

What We’re Looking For

Required

5+ years of professional software engineering experience, with meaningful time operating at senior level
Strong Go - idiomatic, well-tested, production-shipped (if you meet all the other requirements, but have a different language - please apply)
Solid Kubernetes operations experience: you’ve debugged real production incidents, understand the scheduler, and know when a CRD is the wrong answer
Hands-on observability experience - OpenTelemetry, Prometheus, distributed tracing - and a genuine conviction that unobservable systems are untrustworthy
Cloud-native background on AWS or GCP: IAM, managed Kubernetes, infrastructure-as-code
Comfortable working in an AI-augmented development environment - directing tools, critically evaluating their output, and maintaining engineering rigour throughout
Demonstrated ability to own production systems end-to-end: design, ship, monitor, iterate

Preferred

Familiarity with Tekton, Jenkins, Argo Workflows, or similar pipeline infrastructure
Exposure to how LLM-based systems are structured - enough to reason about them as infrastructure dependencies, not black boxes
Experience contributing to platform or shared infrastructure used by multiple teams

What Success Looks Like

First 3 Months

Shipping production code to the AI Foundation platform with minimal ramp-up
Owning at least one service end-to-end: SLOs, alerts, and runbooks included
Identifying and addressing a meaningful observability gap in an existing workflow

3–6 Months

Delivering a materially improved platform capability that the team relies on
Contributing patterns or tooling that other engineers adopt
Being the person teammates reach for on K8s and observability questions

6–12 Months

Leading a significant platform initiative from design through to production reliability
Influencing how the team approaches build vs buy vs integrate decisions
Raising the bar on how the team uses AI tooling in its own engineering practice

Why This Role

Platform engineering here isn’t maintenance work. The systems are genuinely hard - dynamic workloads, heterogeneous signal sources, reliability requirements that don’t bend because the underlying models are probabilistic.

You’ll also be engineering in the way most teams will be working in a few years: AI tooling as a normal part of the workflow, with engineers who know how to get the most out of it without losing ownership of what ships.

Apply Now

Platform Engineering

Engineering in an AI-Native Environment

Technical Collaboration

Working Across Disciplines

Required

Preferred

First 3 Months

3–6 Months

6–12 Months

Jobseeker tools

Employer Tools

Browse

Stay Connected