Company Profile:
Prevalent AI (PAI) is a Security Data Science Company, founded in the UK, by experts recognized globally, for solving the world’s toughest security problems. We apply the world’s best Security Data Science knowledge and expertise to help companies understand, deploy, and support the most advanced security solutions, by developing a security architecture based on a deep understanding of Data Science, Security Tradecraft and Big Data Technologies.
PAI’s Security Data Science (SDS) platform is a big data security analytics platform that can ingest wide range of security telemetry data and apply advanced analytical approaches to identify and detect control weakness and security risks within enterprises.
PAI team consists of Cyber Security Domain Specialists, Information Security Analysts, Data Scientists, Data Engineers, and Data Analysts focused on developing advanced security analytics solutions (Solution Development) and delivering security insights to our clients.
Prevalent AI India Pvt Ltd., a subsidiary of Prevalent AI, has offices in Infopark, Cochin, Kerala. For more information, please visit https://www.prevalent.ai
ROLE PURPOSE
As a Lead Data Scientist at Prevalent, you will lead a team in developing AI-driven solutions that power our core Security Data Science Products. You will work with diverse, large-scale data to uncover insights, build predictive and generative AI models, and solve complex business problems in the cybersecurity and third-party risk management domain.
Beyond hands-on technical work, you will help shape product strategy, drive innovation across the AI/ML stack, and mentor your team. This role offers the opportunity to experiment with cutting-edge technologies—including large language models and agentic AI—lead impactful projects, and make a real difference in Prevalent’s data-driven future.
KEY ACCOUNTABILITIES
Data Science & Machine Learning
- Collaborate with business SMEs to understand requirements and translate them into data science solutions using data preparation, visualization, statistical modeling, and machine learning techniques (supervised, unsupervised, and optimization)
- Design, build, and deploy predictive and classification models, including deep learning architectures (CNNs, Transformers, GNNs) suited to security data problems
- Analyze and validate data for consistency; develop prototypes to demonstrate key elements of models, visualizations, and data transformations
- Communicate insights and predictions through clear reports and visualizations tailored for both technical and non-technical audiences
Work closely with engineering teams to ensure accurate , production-grade implementation of data science designs through documentation, prototype code, testing, and code reviews
Generative AI & LLM Integration
- Design and implement LLM-powered features such as intelligent document processing, automated risk assessment, threat summarization, and conversational interfaces
- Build and optimize Retrieval-Augmented Generation (RAG) pipelines using vector databases (e.g., Pinecone, Weaviate , pgvector ) and embedding models for domain-specific knowledge retrieval
- Evaluate, fine-tune, and deploy foundation models (e.g., OpenAI, Anthropic, open-source LLMs such as Llama/Mistral) using techniques like LoRA , RLHF, and DPO
- Design agentic AI workflows and multi-step reasoning systems using frameworks such as LangChain , LangGraph , or CrewAI for complex security automation tasks
Implement prompt engineering best practices, evaluation frameworks, and guardrails to ensure reliable, safe, and auditable LLM outputs in production.
MLOps & Productionization
- Own the end-to-end ML lifecycle: experiment tracking ( MLflow /W&B), model registry, CI/CD for ML, automated retraining, and model versioning
- Deploy and monitor models in production using cloud-native services (AWS SageMaker, GCP Vertex AI, or Azure ML) with containerized workflows (Docker, Kubernetes)
- Build model monitoring and observability pipelines to track data drift, performance degradation, and model health in real time
Design and manage feature stores and data pipelines to ensure reproducibility and efficiency at scale.
LLMOps
- Build and manage LLM serving infrastructure using tools like vLLM , TGI (Text Generation Inference), or Triton Inference Server for efficient, low-latency model deployment
- Implement prompt versioning, management, and regression testing pipelines to ensure consistency and traceability across prompt iterations
- Set up LLM observability and tracing using platforms such as LangSmith , Arize Phoenix, or Helicone to monitor latency, token usage, cost, and output quality
- Optimize inference costs through strategies like semantic caching , request batching, model routing (large vs. small model tiering), and quantization
- Design and maintain automated evaluation pipelines for LLM outputs, combining programmatic evals, LLM-as-judge patterns, and human-in-the-loop review workflows
- Orchestrate production guardrails including content filtering, output validation, PII detection, and toxicity screening as part of the serving pipeline
Manage LLM gateway and API layer for centralized rate limiting, usage tracking, key management, fallback routing, and multi-provider abstraction.
Responsible AI & Security
- Champion responsible AI practices: bias and fairness auditing, model explainability (SHAP, LIME), and compliance with AI governance frameworks
- Ensure robustness against adversarial attacks, prompt injection , data leakage, and other LLM-specific security risks
Maintain documentation and audit trails for model decisions in alignment with regulatory and enterprise requirements.
TEAM LEADERSHIP
- Lead, mentor, and grow a team of data scientists by setting clear goals, assigning responsibilities, conducting regular 1:1s, and tracking performance
- Promote best practices in data science, solution architecture, code quality, and experimentation methodology across the team
- Communicate complex data-driven insights to non-technical stakeholders and executive leadership with clarity and impact
- Drive a culture of continuous learning, knowledge sharing, and innovation within the data science team
Partner with Product Management and Engineering leadership to influence product roadmap, prioritize AI/ML initiatives, and conduct build-vs-buy analysis for AI capabilities.
SKILLS & EXPERIENCE
Core Data Science & ML (Required)
- 8+ years of experience in Data Science or Machine Learning, with at least 2 years in a lead or senior IC role
- Strong proficiency in Python and SQL; working knowledge of R, Spark, or Scala is a plus
- Deep understanding of ML algorithms: logistic regression, tree-based models ( XGBoost , LightGBM ), SVMs, KNN, ensemble methods, and neural networks
- Hands-on experience with deep learning frameworks ( PyTorch , TensorFlow) and architectures (CNNs, RNNs, Transformers, Attention mechanisms)
- Strong foundation in NLP techniques: text classification, NER, sentiment analysis, topic modeling, and semantic search
Experience with statistical analysis, A/B testing, causal inference, and experimental design.
Generative AI & LLMs (Required)
- Practical experience building applications with LLMs (GPT-4, Claude, Llama, Mistral, or equivalent)
- Hands-on experience designing RAG architectures, working with vector databases, and implementing embedding-based retrieval systems
- Familiarity with fine-tuning techniques ( LoRA , QLoRA , PEFT), RLHF/DPO, and prompt engineering methodologies
- Experience with agentic AI frameworks and multi-step LLM orchestration patterns
Understanding of LLM evaluation, red-teaming, hallucination mitigation, and production guardrails
Infrastructure & Tools (Required)
- Experience with cloud platforms (AWS, GCP, or Azure) and managed ML services (SageMaker, Vertex AI, Azure ML)
- Proficiency with MLOps tooling: experiment tracking ( MLflow , W&B), model registries, and CI/CD for ML pipelines
- Familiarity with containerization (Docker, Kubernetes) and infrastructure-as-code practices
- Experience with modern data stack: data lakehouse architectures (Databricks, Snowflake), streaming (Kafka), and feature stores
Proficiency with data visualization tools and frameworks (Tableau, Streamlit , Gradio , or D3.js) for prototyping and stakeholder communication
Nice to Have
- Experience in cybersecurity, third-party risk management, or GRC (Governance, Risk, and Compliance) domains.
- Contributions to open-source ML/AI projects.
- Published research in ML, NLP, or AI safety.
Experience with graph neural networks or knowledge graphs for security applications.
-
EDUCATION
Master’s or Ph.D. in Computer Science, Data Science, Statistics, Mathematics, Engineering, or a related quantitative field. Equivalent practical experience with a strong portfolio of ML/AI work will also be considered.