Job Title: Senior AI Engineer – LLMs, RAG, and Vector Systems
Experience: 5–10 years in AI/ML (including 3–4 end-to-end AI/LLM project implementations)
Location: Remote
Employment Type: Full-Time
Role Summary
The Senior AI Engineer will lead the design and development of advanced Generative AI systems, including embeddings pipelines, vector database architectures, retrieval-augmented generation (RAG) frameworks, model evaluation pipelines, and enterprise-grade LLM integrations. The role requires deep expertise in transformer architectures, fine-tuning and optimizing LLMs, and implementing GPU-accelerated AI workloads using PyTorch, TensorFlow, and CUDA. The engineer will collaborate with cross-functional teams to build scalable, secure, and highly performant AI platforms.
Key Responsibilities
LLM & RAG Architecture
-
Design, build, and optimize end-to-end RAG systems including retrievers, rankers, context assembly, and generative components.
-
Develop and fine-tune LLMs (open-source and proprietary) for domain-specific use cases.
-
Implement prompt engineering, prompt orchestration, and guardrails for enterprise applications.
-
Create and optimize embedding generation workflows using transformer-based models.
Vector Database & Retrieval Systems
-
Architect high-performance vector search solutions using vector databases (e.g., FAISS, Pinecone, Weaviate, Milvus, PGVector).
-
Implement indexing strategies, ANN algorithms, sharding, and scaling approaches for large embedding stores.
-
Ensure latency optimization, relevance tuning, and reliability of retrieval pipelines.
Evaluation & Monitoring Pipelines
-
Build automated evaluation frameworks for RAG/LLM pipelines using metrics such as faithfulness, relevance, hallucination detection, and latency.
-
Operationalize model monitoring, drift detection, feedback loops, and continuous improvement workflows.
-
Integrate human-in-the-loop (HITL) evaluation mechanisms for production AI systems.
ML Engineering & Orchestration
-
Develop scalable embeddings and model-serving pipelines using Airflow, Kubeflow, Ray, or similar orchestration frameworks.
-
Optimize model performance on GPUs leveraging CUDA kernels, mixed precision training, and distributed training techniques.
-
Implement CI/CD for ML pipelines, model versioning, and reproducibility using MLOps practices.
Integration & Platform Engineering
-
Build APIs, microservices, and inference endpoints to integrate LLM capabilities into enterprise applications.
-
Collaborate with data engineering teams to integrate AI services with data lakes, warehouses, and unstructured content repositories.
-
Ensure security, compliance, observability, and uptime for all AI services.
Required Skills & Qualifications
-
5–10 years of hands-on experience in AI/ML engineering.
-
Minimum 3–4 full-cycle AI/LLM projects delivered in enterprise or production environments.
-
Deep understanding of transformer architectures, LLM internals, fine-tuning strategies, and RAG frameworks.
-
Strong proficiency in Python, PyTorch, TensorFlow, and GPU-accelerated development using CUDA.
-
Experience with vector search technologies (FAISS, Pinecone, Weaviate, Milvus, etc.).
-
Expertise in building embeddings pipelines, evaluation systems, and scalable ML workflows.
-
Strong understanding of distributed systems, containerization (Docker), Kubernetes, and API development.
-
Knowledge of data security, privacy, and governance considerations for enterprise AI.
-
Bachelor’s or Master’s degree in Computer Science, AI/ML, Data Science, or a related technical field.
Preferred Qualifications
-
Experience with commercial LLM ecosystems (OpenAI, Anthropic, Meta Llama, Mistral, etc.).
-
Familiarity with GPU cluster management (NVIDIA Triton, DeepSpeed, Hugging Face Accelerate).
-
Prior work in information retrieval, NLP pipelines, and knowledge-augmented generative systems.
-
Contributions to open-source AI projects or research publications.
Success Criteria
-
Delivery of high-performing and reliable RAG/LLM systems at scale.
-
Demonstrated reduction in latency, improvement in retrieval quality, and model performance gains.
-
Strong cross-functional collaboration with engineering, product, and business stakeholders.
-
Robust evaluation, monitoring, and continuous learning pipelines in production.