Project Role : Large Language Model Architect
Project Role Description : Architect large language models (LLM) that can process and generate natural language. Design neural network parameters, trained on large quantities of unlabeled text data.
Must have skills : Large Language Models (LLMs), Virtual Agents, Generative AI
Good to have skills : NA
Minimum 7.5 year(s) of experience is required
Educational Qualification : 15 years full time education
Summary:
Pod: Pod 2 — Agent Runtime
Reports to: Senior LLM Systems Engineer (Pod 2 Tech Lead)
Experience: 6–10 years
Location: Bangalore / US (flexible)
Note: Explore Internal
ROLE OVERVIEW: The Evaluation Engineer owns the platform's quality gate — the pre-deployment evaluation harness that every agent must pass before it reaches production, and the regression suite that catches regressions in platform behavior.
Roles & Responsibilities:
Design and build the pre-deployment evaluation harness — test suite, scoring framework, and pass/fail criteria
Define evaluation dimensions for agent quality: task completion accuracy, context retrieval relevance, guardrail compliance, tool call correctness
Build and maintain the platform regression suite
Partner with the Senior LLM Systems Engineer on valid agent invocation criteria
Establish the evaluation data strategy
Build tooling that makes evaluation results interpretable to domain teams
IDEAL PROFILE:
Has built evaluation frameworks or test infrastructure for ML or LLM systems in production has strong software engineering fundamentals understands LLM behavior well enough to design meaningful evaluation criteria.
15 years full time education