Overview:
As an AI Principal Engineer specializing in Agentic AI enablement, you will lead the design and delivery of production-grade agent capabilities built on the enterprise AI Backbone across cloud and edge environments – across supply-chain and global functions. You will own end-to-end delivery of key agent modules and integration patterns (MCP/tooling), establish strong evaluation and regression discipline, and drive adoption by partnering with transformation teams, BU, platform engineering, and enterprise application owners. You serve as a technical anchor for the workstream—translating ambiguous business workflows into measurable agent outcomes, proactively identifying risks, proposing options/tradeoffs, and ensuring solutions scale across domains.
Responsibilities:
-
Design and Architect transformative agent systems that enable organization-wide scaling, establishing new paradigms in agent architecture that become company standards. (Lead/Execute)
-
Pioneer novel agent patterns (tool-use orchestration, multi-agent systems, advanced memory architectures) that dramatically improve performance across the enterprise. (Lead/Execute)
-
Transform ambiguous business problems into elegant technical solutions with 10x efficiency gains through innovative approaches to system design. (Lead)
-
Optimize critical performance metrics beyond standard benchmarks, creating breakthrough improvements (90th percentile latency reduction, 50%+ token efficiency, near-perfect tool-call reliability). (Execute/Lead)
-
Establish architectural governance that propagates excellence across teams and projects. (Lead)
-
Design scientifically rigorous evaluation frameworks that uncover non-obvious failure modes and edge cases others miss. (Lead/Execute)
-
Create organization-level evaluation standards and platforms that scale across multiple teams and projects. (Lead)
-
Innovate on automated testing methodologies that dramatically increase code quality while reducing QA overhead. (Execute/Lead)
-
Perform sophisticated statistical analysis of system behaviors to predict quality issues before they manifest. (Execute)
-
Establish early warning systems for emerging failure patterns. (Execute/Lead)
-
Architect intelligent routing systems that autonomously optimize for cost, latency, and quality trade-offs. (Lead/Execute)
-
Pioneer novel approaches to model selection, fine-tuning, and prompt engineering that set new performance standards. (Lead)
-
Create optimization algorithms that continuously improve routing decisions based on real-time feedback loops. (Execute/Lead)
-
Develop proprietary techniques for model evaluation that provide competitive advantage. (Execute/Lead)
-
Design scalable integration architectures that become enterprise standards for AI/app connectivity. (Lead)
-
Create abstraction layers that dramatically simplify how teams connect AI capabilities to enterprise systems. (Execute/Lead)
-
Establish next-generation integration patterns that anticipate future technology directions and enable seamless adoption. (Lead)
-
Develop tooling that accelerates integration velocity across the entire organization. (Execute/Lead)
-
Serve as technical visionary, elevating the entire AI organization's capabilities through knowledge transfer and mentorship. (Lead)
-
Anticipate industry shifts and position the organization to capitalize on emerging technological opportunities. (Lead)
-
Create internal communities of practice that accelerate knowledge sharing and collective innovation. (Lead)
-
Represent the company's technical excellence externally through publications, speaking engagements, and industry contributions. (Lead)
-
Drive cross-functional initiatives that break down silos and create new organizational capabilities. (Lead/Execute)
Qualifications:
Minimum Qualifications
-
Bachelor’s/Master’s in CS/AI/ML or equivalent experience.
-
Expertise in ML (structured and unstructured data) development and engineering
-
Proven experience shipping LLM/agent solutions to production with measurable quality and operational practices.
Required Expertise
-
10+ years of Software Development Experience.
-
Advanced Software Engineering: Python (and Java) mastery with distributed systems expertise; performance optimization (profiling, parallelization); architecture patterns (e.g., FastAPI, asyncio, Pydantic)
-
LLM & Agent Systems: Multi-agent orchestration (LangChain, LangGraph, CrewAI); advanced prompt engineering; custom agent memory architectures; model optimization techniques
-
Evaluation Framework Development: Statistical evaluation design (confidence intervals, power analysis); benchmark creation; instrumentation frameworks (e.g., MLflow, Arise); regression testing systems
-
ML Operations: Production deployment pipelines (Docker, Kubernetes, Ray); model registry management; scaled inference optimization; GPU utilization optimization
-
Enterprise Integration: Enterprise connector development; scalable API architectures; data pipeline engineering (Kafka, gRPC, Redis); authorization protocol implementation
-
Observability Engineering: Telemetry system design (Prometheus, OpenTelemetry); automated anomaly detection; distributed tracing; performance dashboarding (Grafana)
-
System Architecture: Microservice design patterns; high-throughput event processing; fault-tolerance implementation; horizontal scaling architectures
-
Technical Leadership: Architecture governance systems; engineering standards development; build-vs-buy evaluation frameworks; technical roadmap creation
Good-to-have Skills
-
Full-stack dev experience on modern stack
-
Modelling User Interactions with AI Systems; Modeling multi-agent behaviour loops with tools like Temporal
-
Agentic memory Patterns and usage with tools like MEM0 and Temporal
-
Experience with Agentic RAG; Domain level Semantic Layer Designs with Graph and Vector DBs
Differentiating Competencies Required
-
Identify any differentiating behaviors, leadership skills or soft skills required for success in the role.
-
Ownership: drives outcomes end-to-end for a workstream area (not just tasks)
-
Collaboration & customer focus: influences stakeholders to deliver workflow value and adoption
-
Communication & adaptability: executive-ready clarity on progress, risks, and evaluation evidence
-
Proactiveness & initiative anticipates constraints, proposes options/tradeoffs early
-
Strategic thinking: contributes to roadmap sequencing and reusable patterns across domains
Key Differentials :
-
Demonstrates proven history of creating solutions with order-of-magnitude improvements over standard approaches
-
Possesses rare combination of deep technical expertise and strategic business understanding
-
Creates solutions that scale beyond their direct involvement (leveraged impact)
-
Consistently elevates the performance of teams and individuals around them
-
Identifies and solves problems others haven't recognized yet
-
Maintains extraordinary productivity while ensuring knowledge transfer
-
Balances technical perfectionism with pragmatic business value
-
Communicates complex technical concepts effectively to both technical and non-technical stakeholders