Responsibilities
Agentic AI Systems
-
Design reusable patterns for Agentic AI systems including RAG, Multi-Agent Orchestration, and Human-in-the-loop systems
-
Define how different agents communicate, share state, and hand off tasks to one another
-
Architect long-term and episodic memory layers using Vector Databases, embedding pipelines, and knowledge graphs
-
Decide when to use high-reasoning models vs. worker models to optimise cost and performance
-
Predict and control token usage; architect systems with semantic caching to prevent redundant LLM spend
-
Set architectural standards for explainability, auditability, and guardrails to prevent hallucinations and bias
-
Ensure data governance, privacy compliance, and responsible AI practices across all systems
AI Infrastructure & MLOps
-
Design scalable AI infrastructure including model serving, inference architecture, AI microservices, and APIs
-
Architect distributed systems supporting AI workloads
-
Define MLOps and CI/CD pipelines for AI systems
-
Architect containerised and cloud-native deployments; design monitoring and observability for AI services
-
Optimise for cost, performance, and scalability across the AI stack
Enterprise AI & Agentic Architecture
-
Architect enterprise-scale Agentic AI frameworks using LangGraph, Model Context Protocol (MCP), multi-agent orchestration frameworks, and memory-driven AI systems
-
Design and implement RAG pipelines (Hybrid RAG, Graph-RAG), embeddings pipelines (open-source and enterprise models), prompt orchestration, guardrails, and fine-tuning pipelines (PEFT, LoRA, domain adaptation)
-
Build secure LLM deployments across on-prem, air-gapped, and cloud-agnostic environments
-
Define LLMOps lifecycle covering evaluation harness, hallucination detection, observability (tracing, telemetry), and model governance
-
Hands-on experience with agentic AI frameworks — LangChain, LlamaIndex, AutoGen, CrewAI
Data Platform & Lakehouse Engineering
-
Design and govern modern data platforms built on Medallion (Bronze-Silver-Gold) architecture with Delta tables and ACID transactional layers
-
Architect multi-tenant platforms with cost governance and data mesh or federated data architecture patterns
-
Work across the core stack: Databricks, Apache Spark (batch & streaming), Delta Live Tables, Apache Druid, Dremio, Kubeflow Pipelines, Airflow
-
Drive schema evolution and versioning, metadata and lineage management, data quality frameworks, dimensional modelling for analytics, and Kafka-based streaming ingestion
Advanced AI/ML & Deep Learning
-
Architect ML systems using TensorFlow, PyTorch, Scikit-Learn, XGBoost, LSTM, CNN, Transformer models, and Vision-Language Models (VLMs)
-
Design time-series forecasting and anomaly detection solutions for industrial telemetry
Cloud, Infrastructure & DevOps
-
Cloud-native AI architecture on Azure and AWS
-
Containerisation using Docker and Kubernetes (Helm, Operators)
-
Infrastructure as Code using Terraform
-
CI/CD for ML pipelines with secure DevSecOps integration
-
Hybrid and on-prem deployments under compliance constraints
Databases, Graph & Vector Systems
-
RDBMS: PostgreSQL; NoSQL: MongoDB
-
Graph Databases: Neo4j for ontology and knowledge graph modelling
-
Vector Databases: Pinecone, FAISS, Milvus, and enterprise vector DB solutions
-
Context modelling and semantic search frameworks
Requirements
Required Experience
-
15+ years in Data, AI, and Platform Engineering
-
5+ years in an AI Architecture leadership role
-
Proven delivery of enterprise-scale AI platforms in production environments
-
Experience in industrial or engineering AI ecosystems
-
Strong background in distributed systems and scalable data processing
-
B.Tech/BE in Computer Science or related field; M.Tech/MS in Data Science or AI preferred