Role : Al Engineer - Full Stack Al Backend & Infrastructure
Location : Kharadi,Pune
Duration : Full Time
Experience : 2-3 Relevant Years
CTC : Competitive salary
Role Overview
You will own the complete Al backend end-to-end - from agent architecture and model integration to deployment, monitoring, and continuous improvement. You are the single owner of the Al stack: designing agents, building retrieval pipelines, optimizing costs, deploying to production, and iterating based on real-world usage.
This role sits at the intersection of Al engineering, backend engineering, and DevOps. You won't just write prompts — you will architect systems, ship production code, debug production failures, and make cost-vs-quality tradeoffs that directly impact the business.
1. Multi-Agent System Architecture
- Design and scale a multi-agent orchestration system with 10+ specialized agents
- Implement parallel execution (fan-out/fan-in), conditional routing, and shared state handling
- Build new domain-specific agents with tailored retrieval, prompts, tools, and fallback strategies
- Develop graph-based routing logic for dynamic task handling
2. LLM Integration & Optimization
- Manage multi-model, multi-provider architecture
- Select models based on cost, latency, and accuracy tradeoffs
- Build and refine prompt systems (few-shot, structured reasoning, domain tuning)
- Optimize token usage, latency, and response quality
3. Retrieval & Knowledge Systems
- Build hybrid search pipelines (vector + keyword + metadata filtering)
- Optimize chunking, indexing, embeddings, and re-ranking strategies
- Ensure high-quality context retrieval for LLM pipelines
4. Backend & Infrastructure
- Develop scalable backend services for AI pipelines
- Implement streaming responses and async processing
- Deploy and manage services using cloud infrastructure
- Maintain CI/CD pipelines and production environments
5. Observability & Reliability
- Build monitoring systems for latency, errors, and model performance
- Track hallucinations, failure cases, and edge scenarios
- Implement fallback systems and reliability layers
6. Cost & Performance Optimization
- Optimize inference costs across models and workflows
- Balance cost vs quality vs latency in production systems
- Continuously improve system efficiency
Required Skills
- Strong experience with backend development (Python preferred)
- Hands-on experience with LLMs and AI systems in production
- Experience with vector databases and retrieval systems
- Understanding of distributed systems and async processing
- Familiarity with cloud platforms (AWS/GCP/Azure)
Nice to Have
- Experience with multi-agent frameworks
- Knowledge of prompt engineering at scale
- Experience with real-time AI applications
- Exposure to monitoring and observability tools
- Strong ownership mindset
- Ability to work across AI, backend, and infrastructure
- Focus on shipping production-ready systems
- Practical problem-solving over theoretical approaches.
Pay: Up to ₹600,000.00 per year
Work Location: In person