Key Responsibilities
- Design and execute comprehensive test strategies specifically for AI/ML models, LLM-based applications, and data pipelines
- Develop automated test frameworks for model validation, regression testing, and performance benchmarking
- Evaluate model outputs for accuracy, consistency, relevance, hallucination, and bias across diverse inputs and use cases
- Test RAG (Retrieval-Augmented Generation) pipelines, chatbots, recommendation systems, and other AI-driven features
- Collaborate with data scientists and ML engineers to define acceptance criteria and quality thresholds for AI systems
- Build and maintain evaluation datasets, ground truth sets, and adversarial test cases for comprehensive model validation
- Monitor models in production for drift, degradation, and anomalous behavior; implement monitoring solutions as needed
- Validate data quality, data pipelines, and feature stores that feed AI systems to ensure data integrity
- Document defects, edge cases, and failure patterns specific to AI behavior with actionable insights
- Ensure AI systems meet ethical, fairness, and compliance standards through bias audits and explainability checks
Required Skills & Qualifications
- Bachelor's or Master's degree in Computer Science, Engineering, or a related field
- 3 - 6 years of professional QA experience, with at least 1 - 2 years in AI/ML quality assurance
- Strong proficiency in Python for test automation and data analysis
- Familiarity with LLM evaluation frameworks (e.g., RAGAS, DeepEval, Promptfoo, LangSmith)
- Hands-on experience with testing tools such as Pytest, Selenium, Postman, or similar platforms
- Solid understanding of the ML lifecycle — training, validation, deployment, and monitoring phases
- Knowledge of data quality tools and pipeline testing (e.g., Great Expectations, dbt tests)
- Strong analytical and inquisitive mindset with the ability to challenge model outputs critically
- Excellent documentation and communication skills with the ability to articulate complex technical concepts
- Collaborative approach and ability to work effectively with data science, engineering, and product teams
Nice to Have
- Experience with prompt engineering and red-teaming LLMs
- Familiarity with MLOps platforms such as MLflow, SageMaker, or Vertex AI
- Knowledge of vector databases and embedding quality evaluation
- Understanding of AI safety, responsible AI principles, and fairness frameworks
- Experience with A/B testing and shadow deployment strategies
- Knowledge of CI/CD pipelines and DevOps practices in ML environments
Application Question(s):
- Do you have a Bachelor's or Master's degree in Computer Science, Engineering, or a related field?
- Do you have at least 1-2 years of professional experience specifically in AI/ML quality assurance?*
- Are you proficient in Python for test automation and data analysis?
- Do you have hands-on experience with at least one LLM evaluation framework such as RAGAS, DeepEval, Promptfoo, or LangSmith?
- Have you worked with data quality tools or pipeline testing tools such as Great Expectations or dbt tests?
- How many years of professional QA experience do you have?
- Which of the following testing tools have you used? (Select all that apply)*
Pytest
Selenium
Postman
None of the above
- What is your experience level with the ML lifecycle including training, validation, deployment, and monitoring phases?
- How many times have you changed firms during your professional career?*
0-2 times
3-5 times
More than 5 times
- What is your current location or relocation preference?*
Mumbai-based
Bangalore-based
Open to relocation
Other location
- Salary budget is INR 2500000 - 3000000Max, what are your expectations?
Work Location: In person