Overview:
We are looking for a hands-on **Generative AI Technical Architect** who will own the end-to-end architecture of enterprise-scale, knowledge-intensive, agentic AI systems. This is a high-impact role focused on building production-grade Retrieval-Augmented Generation (RAG), Corrective/Controllable-Augmented Generation (CAG), multi-agent frameworks, long-term memory systems, NL2SQL engines, and Small Language Model (SLM)-powered edge/agent deployments using modern ecosystems (LangChain, LlamaIndex, CrewAI, AutoGen, Haystack, DSPy, etc.).
Responsibilities:
- Architect and own the enterprise GenAI platform with advanced RAG/CAG pipelines (hybrid search, re-ranking, query rewriting, hypothetical document embeddings (HyDE), parent-child retrieval, knowledge graph + vector fusion).
- Design and scale multi-agent / agentic workflows (reasoning + acting, tool use, multi-agent collaboration, hierarchical agents, long-running agents with persistence).
- Build production-grade long-term and short-term memory systems (vector stores with metadata filtering, session summarization, entity memory, reflection/memory consolidation).
- Lead architecture of enterprise Knowledge Bases (ingestion pipelines, chunking strategies, metadata enrichment, incremental updates, multi-tenant KB isolation).
- Own NL2SQL / Text-to-SQL architecture (schema linking, few-shot prompting, self-correction, execution feedback loops, SQL guardrails, multi-database support).
- Design and deploy Small Language Models (SLM) for on-device, low-latency, or cost-sensitive agent use cases (Phi-3, Gemma-2B, Mistral-7B, Llama-3.1-8B quantized, TinyLlama, MobileBERT variants).
- Define the standard GenAI framework stack (LangChain / LlamaIndex / LangGraph / CrewAI / AutoGen / Microsoft Semantic Kernel / Haystack / DSPy) and create internal libraries/SDKS for the entire organization.
- Build observability, tracing, and evaluation frameworks for RAG (RAGAS, TruLens, DeepEval), agents (AgentOps), and NL2SQL accuracy.
- Establish governance: prompt injection defense, output sanitization, PII redaction, citation verification, hallucination detection, and enterprise guardrails.
- Performance engineering: latency optimization (speculative decoding, caching, batching, query routing), cost optimization (SLM routing, fallback strategies), and multi-region deployment.
- Drive GenAI platform roadmap, conduct architecture reviews, and mentor senior engineers building RAG/agent products.
Requirements:
- 1+ years building and shipping production RAG/CAG systems used by 100K+ daily active users.
- Deep expertise in modern retrieval techniques: dense (ColBERT, Splade, bge, e5), sparse (BM25, SPLADE), hybrid, re-ranking (cross-encoders, Cohere Rerank, bge-reranker), sentence transformers, and late interaction models.
- Proven track record designing and scaling agentic systems with tool calling, planning (ReAct, Plan-and-Execute, Reflexion), and multi-agent orchestration.
- Hands-on experience with vector databases at scale (Pinecone, Weaviate, Milvus, Zilliz, Qdrant, PGVector, Redis, Vespa, Elasticsearch with vector support).
- Expert in LangChain, LangGraph, LlamaIndex, CrewAI, AutoGen, and DSPy — including custom node creation, memory modules, and production deployment patterns.
- Production NL2SQL systems (accuracy >92% on Spider/BIRD benchmarks in real enterprise schemas).
- Deployed SLMs in production (quantized 4-bit/8-bit, ONNX/TensorRT-LLM export, edge deployment).
- Strong Python, async frameworks, FastAPI, graph databases (Neo4j, FalkorDB for knowledge graphs), and Kubernetes.
### Preferred (Significant Advantage)
- Previously defined the GenAI/RAG/agent stack for a unicorn or large enterprise (Jasper, Glean, Adept, Cresta, Moveworks, Salesforce Einstein, Microsoft Copilot team, etc.).
- Contributions to LangChain, LlamaIndex, Haystack, or RAGAS open-source repositories.
- Built enterprise knowledge bases processing 10M+ documents with sub-second retrieval latency.
- Experience with controllable generation (CAG), guided generation (Outlines, Guidance, LMQL), and structured output enforcement.
If you have architected and shipped multiple enterprise RAG + Agent + NL2SQL + Memory systems that are live in production today, and you live and breathe LangChain/LlamaIndex every day — this is your role.