Candidate Skill:
Agentic AI, Multi-Cloud, LLM, RAG, Python, Microservices, Kubernetes, Observability, Security, APIs
Job Description:
We are seeking a Principal Agentic Architect to lead the design and delivery of enterprise-scale agentic AI systems across multi-cloud environments (AWS, Azure, GCP). This role requires deep hands-on expertise in LLMs, RAG, distributed systems, and cloud architecture, along with strong leadership in driving secure, scalable, and observable AI platforms. Key Responsibilities Platform Architecture (Multi-Cloud) Design and own the Agentic AI Platform (orchestration, RAG, guardrails, evaluation, observability) Define cloud architecture standards (VPC, IAM, networking, secrets, security) Implement secure communication (TLS/mTLS, WAF, service mesh) Agent Design & Orchestration Lead development of complex agent workflows and multi-agent systems Define orchestration patterns, state machines, retries, fallbacks, and HITL flows Ensure reliable rollout strategies (canary, blue/green, feature flags) API & Interoperability Standards Define REST APIs and SDK standards (Python/TypeScript) Ensure OpenTelemetry-based observability and traceability Standardize agent interoperability (MCP, A2A APIs) RAG & Data Architecture Design RAG pipelines (embeddings, retrieval, chunking, prompt strategies) Integrate with enterprise systems (Snowflake, etc.) Enforce schema-driven outputs and data contracts Security, Reliability & Operations Drive SLIs/SLOs, monitoring, cost optimization, and performance tuning Implement PII protection, audit logging, and governance frameworks Lead incident response, RCA, and disaster recovery strategies Leadership & Governance Lead architecture reviews and cross-team collaboration Provide technical mentorship and best practices Define architecture standards, ADRs, and governance processes Required Skills Strong expertise in Multi-Cloud Architecture (AWS / Azure / GCP) Deep experience with Agentic AI systems, LLMs, and orchestration frameworks Strong knowledge of RAG pipelines and vector databases (Pinecone, Weaviate, pgvector) Experience in microservices, APIs, and SDK development Expertise in observability (OpenTelemetry, Grafana, Prometheus) Hands-on experience with Docker, Kubernetes, CI/CD pipelines Strong understanding of security (IAM, encryption, mTLS, secrets management) Experience with evaluation frameworks and AI guardrails Strong programming skills in Python Good to Have Experience with LangGraph, AutoGen, CrewAI, LlamaIndex Knowledge of multimodal AI (document/image processing, OCR, extraction) Familiarity with policy-as-code, security compliance, and governance frameworks Experience in chaos engineering and DR strategies Exposure to advanced RAG (hybrid search, reranking, caching) Understanding of ML frameworks (PyTorch, Hugging Face) Soft Skills Strong leadership and stakeholder management Excellent problem-solving and strategic thinking Ability to drive large-scale architecture decisions