Senior AI Testing Engineer (Generative AI)

Wadhwani Foundation -
Bengaluru, Karnataka

Quick apply

Job details

2 days ago

Qualifications

CI/CD
Azure
Law
DevOps
Test automation
Performance testing
Master's degree
Databases
AWS
Continuous integration
APIs
Software testing
Integration testing
AI
MBA
Python

Full job description

Senior AI Testing Engineer (Generative AI)

Location India Remote / Hybrid / In-office — [specify your actual working model here]

Experience 5–8 years total experience in software testing, QA engineering, or SDET roles, with at least 2–3 years of meaningful, hands-on exposure to Generative AI systems, LLM applications, or AI quality engineering.

Role Overview

We are looking for a Senior AI Testing Engineer to own quality across our Generative AI products and platform.

This role is fundamentally about engineering quality into AI systems — not running test scripts. You'll design evaluation frameworks, build automated testing pipelines, and define what "good" looks like for LLM outputs, RAG systems, AI agents, and voice AI applications. You'll work directly with AI engineers and product teams to make sure our systems are reliable, safe, and measurably improving over time.

If you understand how LLMs fail, know how to catch hallucinations before users do, and want to build the quality infrastructure that underpins production AI at scale — this is the role.

Key Responsibilities

Evaluation Strategy & Frameworks

Design and own comprehensive testing strategies for Generative AI products — including LLM applications, RAG pipelines, AI agents, voice AI systems, and workflow automation
Define evaluation methodologies covering functional testing, response quality, hallucination detection, safety and guardrail testing, prompt injection, bias and toxicity, retrieval quality, latency benchmarking, and agent workflow validation
Build reusable AI testing frameworks and automation pipelines for continuous evaluation
Create datasets, benchmark suites, and golden test sets for GenAI evaluation

Automated Evaluation

Develop automated evaluation pipelines using LLM-as-a-Judge and hybrid evaluation methods
Implement CI/CD-integrated AI evaluation pipelines
Drive observability and monitoring strategies for production AI systems

Quality Standards & Collaboration

Define measurable quality KPIs for AI systems
Establish testing standards, best practices, and governance processes for GenAI applications
Work closely with AI engineers, product, and platform teams to embed quality throughout the development lifecycle

Required Skills & Experience

Testing & Engineering Experience

5–8 years in software testing, QA engineering, SDET, or test automation
2–3 years of hands-on experience testing or evaluating production-grade Generative AI or LLM-based systems
Strong test automation skills in Python
Experience designing scalable automated testing frameworks
Familiarity with API testing, integration testing, and performance testing

Generative AI Knowledge

Solid understanding of how LLM systems work — and how they fail
Experience with RAG architectures, prompt engineering, AI agents, embedding models, and vector databases
Understanding of LLM evaluation methodologies and AI system failure modes

GenAI Testing Frameworks

Hands-on experience with at least one or more GenAI evaluation frameworks, such as: DeepEval, Ragas, LangSmith, Promptfoo, TruLens, OpenAI Evals, or LangChain evaluation tools

Quality Engineering

Expertise in test strategy, test planning, test automation architecture, defect lifecycle management, and quality metrics
Ability to define and track measurable quality KPIs for AI systems

Preferred Qualifications

Experience with cloud platforms (AWS, Azure, or GCP)
Familiarity with MLOps / LLMOps workflows
Experience with CI/CD pipelines and DevOps practices
Exposure to monitoring and observability tooling for AI systems
Understanding of security and compliance for GenAI products
Experience with conversational AI or voice AI systems

MBA

Quick apply

Jobseeker tools

Employer Tools

Browse

Stay Connected