We are seeking a highly experienced Principal AI/ML Architect with 12+ years of experience to lead the design and development of enterprise-scale AI platforms, including agentic AI systems and generative AI solutions.
This role requires deep expertise in LLM architecture, AI product development, and advanced analytics, combined with strong engineering foundations in cloud, MLOps, and distributed systems. You will drive end-to-end AI strategy—from research and prototyping to production deployment—while ensuring responsible and governed AI adoption.
Key Responsibilities
AI & GenAI Architecture
- Design and implement scalable AI platforms supporting agentic AI systems and autonomous workflows
- Architect and optimize LLM-based systems, including fine-tuning, inference, and orchestration
- Build advanced RAG (Retrieval-Augmented Generation) pipelines and multi-agent systems
- Develop multimodal AI systems (text, image, audio) for enterprise use cases
- Lead AI product development from concept to production deployment
Machine Learning & Advanced Analytics
- Develop and deploy predictive models and advanced analytics solutions
- Design scalable ML systems for real-time and batch processing
- Implement optimization techniques such as PEFT, LoRA, QLoRA, and mixed precision training
- Apply reinforcement learning approaches including RLHF (Reinforcement Learning with Human Feedback)
- Ensure robustness using frameworks like RAG Triad (retrieval, augmentation, generation evaluation)
Architecture & Engineering
- Design systems using microservices and hexagonal architecture patterns
- Build and maintain scalable data pipelines and API integrations
- Ensure seamless integration between AI services and enterprise platforms
- Lead system design reviews and ensure high availability, scalability, and security
MLOps & Platform Engineering
- Implement end-to-end MLOps pipelines, including CI/CD for ML systems
- Deploy and manage models using Kubernetes and Docker
- Establish model monitoring, drift detection, and performance tracking
- Automate model lifecycle management and continuous retraining workflows
Cloud & Data Platforms
- Architect and manage AI workloads across cloud platforms:
- AWS (S3, SageMaker, Redshift, Glue)
- GCP and Azure ecosystems
- Work with modern data platforms such as Databricks and Cosmos DB
- Optimize large-scale data processing and storage for AI workloads
Responsible AI & Governance
- Define and implement AI governance frameworks
- Ensure compliance with responsible AI principles (fairness, explainability, transparency)
- Implement guardrails and safety mechanisms for LLM systems
- Align AI systems with enterprise risk, compliance, and regulatory requirements