Our Company:
At Teradata, we believe that people thrive when empowered with better information. Teradata Autonomous Knowledge Platform activates enterprise intelligence by unifying data, knowledge and business context to achieve tangible outcomes. With Teradata, organizations can provide agents with full context for impact when it matters. Our solution lets businesses connect and scale on premises, in the cloud, or through a hybrid approach. Teradata delivers real business value with AI.
What You’ll Do
We are looking for an exceptional Senior Staff Cloud & AI Platform Engineer to own, architect, and independently build a next‑generation enterprise‑scale deployment and orchestration platform. This role goes far beyond traditional CI/CD or infrastructure automation—it sits at the intersection of distributed systems, control‑plane architecture, SRE principles, and applied AI/agentic workflows.
- Lead the design and implementation of a stateful deployment control plane that treats deployments as a continuous convergence problem.
- Enable safe, resumable, deterministic, and SLO‑governed rollouts across multi‑cloud, on‑prem, and customer environments.
- Drive the practical application of AI agents in DevOps by embedding intelligent decision‑making into deployment workflows.
- Apply AI agents for anomaly detection, failure diagnosis, rollout control, and automated mitigation.
- Ensure all AI‑assisted workflows preserve correctness, determinism, and full auditability.
This is a high‑impact individual contributor role requiring deep technical judgment, architectural ownership, and the ability to turn abstract design into production‑grade systems.
Key Responsibilities
- Design and implement a stateful deployment control plane with deterministic, resumable execution and rollback support.
- Build deployment workflows that reconcile desired vs. actual state using durable control loops.
- Support multi‑cloud, on‑prem, and customer environments with a single orchestration framework.
- Implement SLO‑governed promotion gates, error‑budget checks, and bounded retry policies.
- Enable deployment models including self‑service, wave‑based rollouts, blue‑green, managed updates, and maintenance windows.
- Define declarative, extensible workflow schemas with explicit dependencies, retries, and rollback semantics.
- Integrate deployment orchestration with CI/CD systems, artifact manifests, entitlement services, and downstream execution systems.
- Embed AI‑assisted decision logic into deployment workflows for anomaly detection, failure diagnosis, and rollout control.
- Ensure AI‑driven actions remain deterministic, auditable, reversible, and policy‑constrained.
- Develop infrastructure and automation frameworks using Terraform, Python, and cloud‑native services.
- Embed observability and telemetry into workflows to drive automated validation, retries, and rollback decisions.
- Provide hands‑on technical leadership, mentoring engineers and driving platform architecture across DevOps, SRE, and CloudOps teams.
Who You’ll Work With
This is an individual contributor role. You will collaborate with project team members and architects, and report to the Senior Engineering Manager.
What Makes You a Qualified Candidate
- Bachelor’s or Master’s degree in computer science or related field
- 10+ years of hands‑on experience in DevOps, platform engineering, cloud infrastructure, or distributed systems.
- Proven experience designing and building stateful systems such as control planes, workflow engines, or orchestration platforms.
- Experience building or operating enterprise‑scale deployment systems across multi‑cloud, hybrid, or on‑prem environments.
- Deep expertise in CI/CD, release engineering, and deployment strategies including wave‑based, blue‑green, and managed rollouts.
- Practical experience applying SRE principles such as SLOs, error budgets, promotion gates, and automated rollback.
- Strong programming skills in Python, with the ability to build long‑running services and automation frameworks.
- Hands‑on experience with Infrastructure as Code, preferably Terraform, and cloud‑native services (AWS, Azure, and/or GCP).
- Experience integrating observability and telemetry (metrics, logs, traces) into automated systems for decision‑making.
- Demonstrated experience building or integrating AI‑assisted or agentic workflows into DevOps or operational systems.
- Ability to design safe, auditable, policy‑constrained automation, especially when applying AI or autonomous decision logic.
- Strong technical judgment and communication skills, with experience leading architecture across teams as an individual contributor.
#LI-MP1
Why We Think You’ll Love Teradata We prioritize a people-first culture because we know our people are at the very heart of our success. We embrace a flexible work model because we trust our people to make decisions about how, when, and where they work. We focus on well-being because we care about our people and their ability to thrive both personally and professionally. We are committed to actively working to foster an inclusive environment that celebrates people for all of who they are.