AI Platform Architecture
- Define and own the end-to-end architecture of the organisation's AI platform — spanning model development, training infrastructure, serving, monitoring, and governance
- Design scalable, cloud-native AI infrastructure on GCP (Vertex AI), Azure (Azure AI / ML), or AWS (SageMaker) supporting both training and real-time inference workloads
- Establish reference architectures for generative AI applications including RAG pipelines, agentic systems, and multi-modal AI workflows
- Architect model serving infrastructure — including GPU cluster management, batching strategies, model caching, and latency-optimised inference endpoints
- Evaluate and select AI frameworks, tooling, and managed services — balancing build vs. buy, cost, scalability, and strategic fit
Generative AI & LLM / SLM Systems
- Lead the architecture of generative AI systems leveraging LLMs (GPT-4, Gemini, Claude, Llama, Mistral) and SLMs (Phi-3, Gemma, TinyLlama) for production use cases
- Design and implement enterprise-grade RAG architectures — including chunking strategies, embedding models, vector database selection (Pinecone, Weaviate, pgvector, Vertex AI Vector Search), re-ranking, and hybrid retrieval
- Architect agentic AI systems with tool use, memory, planning, and multi-agent orchestration using frameworks such as LangChain, LlamaIndex, AutoGen, or CrewAI
- Define prompt engineering standards, prompt management, and version-controlled prompt libraries for production systems
- Establish guardrails, content safety, hallucination mitigation, and evaluation frameworks (RAGAS, TruLens, DeepEval) for GenAI applications
Model Training & Fine-Tuning
- Architect and oversee pre-training, continued pre-training, and supervised fine-tuning (SFT) workflows for both LLMs and domain-specific models
- Design and implement parameter-efficient fine-tuning (PEFT) pipelines using LoRA, QLoRA, DoRA, and adapter-based methods using Hugging Face PEFT and TRL libraries
- Lead RLHF (Reinforcement Learning from Human Feedback) and DPO (Direct Preference Optimisation) workflows for alignment and preference tuning
- Architect distributed training infrastructure — multi-GPU and multi-node training using PyTorch FSDP, DeepSpeed ZeRO, and Megatron-LM
- Design data preparation pipelines for model training: dataset curation, deduplication, tokenisation, instruction formatting, and quality filtering
- Establish experiment tracking, hyperparameter optimisation, and model evaluation frameworks (MLflow, Weights & Biases, Optuna)
MLOps & Model Lifecycle
- Define and implement the full MLOps lifecycle — from experiment management through model registry, staging, shadow deployment, A/B testing, and production monitoring
- Design model observability frameworks: drift detection, performance degradation alerting, data quality monitoring, and automated retraining triggers
- Architect CI/CD pipelines for ML — automated training, evaluation gating, model promotion, and rollback procedures
- Lead the adoption of feature stores (Feast, Tecton, Vertex AI Feature Store) for consistent feature management across training and serving
AI Governance & Responsible AI
- Establish the organisation's responsible AI framework — covering fairness, explainability, bias detection, privacy-preserving ML, and regulatory compliance
- Define model documentation standards (model cards), dataset governance, and audit trails for model lineage and versioning
- Lead AI risk assessment processes and engage with legal, compliance, and ethics stakeholders on AI system deployment decisions
Thought Leadership & Mentorship
- Act as the organisation's internal subject matter expert on AI/ML — publishing internal research notes, architecture decision records, and technical standards
- Mentor senior ML engineers and applied scientists; conduct architectural design reviews and raise the bar on AI engineering quality
- Track and evaluate emerging AI research (arXiv, NeurIPS, ICML, ICLR) and advise on adoption of applicable breakthroughs
- Represent the organisation in external forums, vendor engagements, and technical partnerships
3. Required Skills & Experience
Foundations — Machine Learning & Deep Learning
- 10+ years in AI/ML engineering or architecture roles with a demonstrable progression into platform-level or architectural responsibility
- Expert-level understanding of supervised, unsupervised, and reinforcement learning — including classical ML (gradient boosting, SVMs, ensembles) and deep learning (CNNs, RNNs, Transformers)
- Deep knowledge of the Transformer architecture — attention mechanisms, positional encoding, tokenisation, KV caching, and scaling laws
- Proficiency in PyTorch (primary) and familiarity with JAX/Flax for research-oriented workloads
- Strong mathematical foundation: linear algebra, probability, statistics, information theory, and optimisation
Generative AI — LLMs & SLMs
- Hands-on production experience with large language models — including prompt engineering, context window management, structured output, function calling, and multi-modal inputs
- Expert knowledge of the LLM landscape: GPT-4o, Gemini 1.5 Pro, Claude 3.x, Llama 3, Mistral, Mixtral, Command R+, and open-weight SLMs (Phi-3, Gemma 2, TinyLlama)
- Deep understanding of LLM inference optimisation: quantisation (GPTQ, AWQ, GGUF), speculative decoding, continuous batching, and PagedAttention (vLLM)
- Experience with model benchmarking and evaluation: MMLU, HellaSwag, HumanEval, MT-Bench, and custom domain evals
RAG — Retrieval-Augmented Generation
- Expert in designing and deploying production RAG systems — naive RAG, advanced RAG (HyDE, reranking, query decomposition), and modular RAG architectures
- Hands-on experience with embedding models: OpenAI Ada, text-embedding-3, BGE, E5, Cohere Embed — and selection criteria for domain-specific retrieval
- Proficiency with vector databases: Pinecone, Weaviate, Qdrant, Chroma, pgvector, and Vertex AI Vector Search
- Deep knowledge of chunking strategies, document parsing, metadata filtering, hybrid search (BM25 + dense), and retrieval evaluation (RAGAS, TruLens)
Fine-Tuning & PEFT — LoRA, QLoRA & Beyond
- Expert-level, hands-on experience with parameter-efficient fine-tuning: LoRA, QLoRA, DoRA, IA³, prefix tuning, and prompt tuning using Hugging Face PEFT
- Experience with full fine-tuning and continued pre-training workflows for domain adaptation and instruction following (Alpaca, ShareGPT, custom instruction datasets)
- Proficiency with RLHF pipelines: reward model training, PPO-based alignment using TRL, and DPO / IPO preference optimisation
- Hands-on experience with distributed training frameworks: DeepSpeed ZeRO (Stages 1–3), PyTorch FSDP, and Megatron-LM for multi-GPU/multi-node training
- Familiarity with training infrastructure: NVIDIA A100 / H100 clusters, NVLink, NCCL, and GPU memory optimisation techniques (gradient checkpointing, mixed precision, flash attention)
MLOps & Infrastructure
- Production-grade MLOps experience: ML pipelines, model registries, A/B testing, shadow deployment, drift monitoring, and automated retraining
- Proficiency with MLOps platforms: MLflow, Weights & Biases, Kubeflow, Vertex AI Pipelines, or Azure ML
- Containerisation and orchestration: Docker, Kubernetes, and Helm for scalable model serving
- Experience with model serving frameworks: vLLM, TGI (Text Generation Inference), Triton Inference Server, BentoML, or Ray Serve
- Infrastructure-as-code proficiency: Terraform or Pulumi for reproducible AI infrastructure provisioning
Certifications & Credentials (Preferred)
- Google Cloud Professional ML Engineer or Vertex AI Specialist
- AWS Certified Machine Learning Specialty or Azure AI Engineer Associate
- Hugging Face certifications or demonstrated open-source contributions to ML repositories
- Published research (arXiv, conference proceedings) or recognised technical blog authorship in AI/ML
4. Technical Ecosystem
Foundation Models & GenAI Frameworks
- GPT-4o / GPT-4 Turbo
- Gemini 1.5 Pro / Flash
- Claude 3.x (Opus/Sonnet)
- Llama 3 / Mistral / Mixtral
- Phi-3 / Gemma 2 / TinyLlama
- LangChain / LlamaIndex
- AutoGen / CrewAI
- Semantic Kernel
Training, Fine-Tuning & PEFT
- PyTorch / FSDP
- DeepSpeed ZeRO
- Hugging Face PEFT
- TRL (SFT / DPO / PPO)
- LoRA / QLoRA / DoRA
- Flash Attention 2
- Megatron-LM
- Weights & Biases
RAG, Vector Databases & Serving
- Pinecone / Weaviate
- pgvector / Qdrant
- RAGAS / TruLens
- BGE / E5 / Ada Embeddings
- vLLM / TGI
- Triton Inference Server
- Ray Serve / BentoML
- MLflow / Kubeflow
5. Leadership & Behavioural Competencies
Systems Thinking
Designs AI systems that are robust end-to-end — from data pipeline to model to application — with production failure modes in mind.
Research Fluency
Reads and critically evaluates AI research papers; translates applicable findings into engineering practice without chasing hype.
Engineering Rigour
Brings software engineering discipline to AI — versioning, testing, observability, and reproducibility as first-class concerns.
Executive Communication
Distils complex AI concepts into crisp business narratives for C-suite and board-level stakeholders.
Responsible AI Mindset
Proactively identifies bias, fairness, safety, and privacy risks in AI systems and champions mitigation as a design principle.
Mentorship & Influence
Elevates the capability of the wider AI engineering team through design reviews, pairing, technical writing, and knowledge sharing.