Senior AI Engineer – LLMs, RAG, and Vector Systems

INOPTRA DIGITAL
Bengaluru, Karnataka

Quick apply

Job details

Full-time

Qualifications

TensorFlow
CI/CD
Law
Computer Science
Kubernetes
PyTorch
Master's degree
OS Kernels
Docker
Bachelor's degree
Distributed systems
Continuous integration
Natural language processing
APIs
Data science
AI
Python

Full job description

Job Title: Senior AI Engineer – LLMs, RAG, and Vector Systems

Experience: 5–10 years in AI/ML (including 3–4 end-to-end AI/LLM project implementations)

Location: Remote

Employment Type: Full-Time

Role Summary

The Senior AI Engineer will lead the design and development of advanced Generative AI systems, including embeddings pipelines, vector database architectures, retrieval-augmented generation (RAG) frameworks, model evaluation pipelines, and enterprise-grade LLM integrations. The role requires deep expertise in transformer architectures, fine-tuning and optimizing LLMs, and implementing GPU-accelerated AI workloads using PyTorch, TensorFlow, and CUDA. The engineer will collaborate with cross-functional teams to build scalable, secure, and highly performant AI platforms.

Key Responsibilities

LLM & RAG Architecture

Design, build, and optimize end-to-end RAG systems including retrievers, rankers, context assembly, and generative components.
Develop and fine-tune LLMs (open-source and proprietary) for domain-specific use cases.
Implement prompt engineering, prompt orchestration, and guardrails for enterprise applications.
Create and optimize embedding generation workflows using transformer-based models.

Vector Database & Retrieval Systems

Architect high-performance vector search solutions using vector databases (e.g., FAISS, Pinecone, Weaviate, Milvus, PGVector).
Implement indexing strategies, ANN algorithms, sharding, and scaling approaches for large embedding stores.
Ensure latency optimization, relevance tuning, and reliability of retrieval pipelines.

Evaluation & Monitoring Pipelines

Build automated evaluation frameworks for RAG/LLM pipelines using metrics such as faithfulness, relevance, hallucination detection, and latency.
Operationalize model monitoring, drift detection, feedback loops, and continuous improvement workflows.
Integrate human-in-the-loop (HITL) evaluation mechanisms for production AI systems.

ML Engineering & Orchestration

Develop scalable embeddings and model-serving pipelines using Airflow, Kubeflow, Ray, or similar orchestration frameworks.
Optimize model performance on GPUs leveraging CUDA kernels, mixed precision training, and distributed training techniques.
Implement CI/CD for ML pipelines, model versioning, and reproducibility using MLOps practices.

Integration & Platform Engineering

Build APIs, microservices, and inference endpoints to integrate LLM capabilities into enterprise applications.
Collaborate with data engineering teams to integrate AI services with data lakes, warehouses, and unstructured content repositories.
Ensure security, compliance, observability, and uptime for all AI services.

Required Skills & Qualifications

5–10 years of hands-on experience in AI/ML engineering.
Minimum 3–4 full-cycle AI/LLM projects delivered in enterprise or production environments.
Deep understanding of transformer architectures, LLM internals, fine-tuning strategies, and RAG frameworks.
Strong proficiency in Python, PyTorch, TensorFlow, and GPU-accelerated development using CUDA.
Experience with vector search technologies (FAISS, Pinecone, Weaviate, Milvus, etc.).
Expertise in building embeddings pipelines, evaluation systems, and scalable ML workflows.
Strong understanding of distributed systems, containerization (Docker), Kubernetes, and API development.
Knowledge of data security, privacy, and governance considerations for enterprise AI.
Bachelor’s or Master’s degree in Computer Science, AI/ML, Data Science, or a related technical field.

Preferred Qualifications

Experience with commercial LLM ecosystems (OpenAI, Anthropic, Meta Llama, Mistral, etc.).
Familiarity with GPU cluster management (NVIDIA Triton, DeepSpeed, Hugging Face Accelerate).
Prior work in information retrieval, NLP pipelines, and knowledge-augmented generative systems.
Contributions to open-source AI projects or research publications.

Success Criteria

Delivery of high-performing and reliable RAG/LLM systems at scale.
Demonstrated reduction in latency, improvement in retrieval quality, and model performance gains.
Strong cross-functional collaboration with engineering, product, and business stakeholders.
Robust evaluation, monitoring, and continuous learning pipelines in production.

Quick apply

Jobseeker tools

Employer Tools

Browse

Stay Connected