Role Mission
Own functional correctness, integration health, and release readiness of the end-to-end voice bot system. You are the engineer who certifies that every environment — dev, staging, UAT, prod — behaves correctly across services, partners, and the multi-agent runtime, and who signs off that a release is safe to roll out. You do not own model/answer quality (that sits with our AI engineers and data scientists); you own that the system, as a system, works.
Key Responsibilities
- Design and own the end-to-end functional test strategy for the voice runtime — covering call ingress, ASR/TTS partners, router and middleware, multi-agent orchestration, tool/API calls, downstream system integrations, and call termination.
- Build and maintain integration test suites that exercise the full graph of services — including partner APIs (ElevenLabs, Decagon), internal microservices, queues/streams, databases, and downstream business systems.
- Define and enforce environment readiness — dev, staging, UAT, pre-prod, prod. Build environment-health checks, configuration drift detection, and smoke suites that gate every deploy.
- Own production sanity testing — synthetic call flows, heartbeat checks, post-deploy validations — and the runbooks the on-call team uses to confirm a release is healthy.
- Build regression suites that catch functional breakage from agent definition changes, prompt updates, SOP edits, tool changes, and routing rule changes — not the answer quality itself, but whether the system still flows correctly end-to-end.
- Triage production incidents from a functional / integration angle, convert them into automated regression tests, and feed signal back to the runtime, platform, and partner integration teams.
- Partner with the AI engineering and data science teams who own answer quality and evals — ensure their eval pipelines have stable environments to run against, and that functional regressions never silently corrupt eval results.
Must-Haves
- 5–8 years in functional, integration, and end-to-end QE on complex distributed systems — microservices, queues, third-party API integrations, and multi-environment deployments.
- Strong API-level test automation — REST, gRPC, webhooks, event-driven — using tools like REST Assured, pytest, Postman/Newman, Karate, or equivalents.
- Demonstrated ownership of environment readiness and release certification — you have been the person who decides whether a release ships.
- Experience testing systems with heavy third-party / partner integrations, including handling partner outages, version drift, and contract changes.
- Experience testing LLM / GenAI or multi-agent systems required (Must) — you have functionally tested systems built on LangGraph, LangChain, AutoGen, CrewAI, or comparable frameworks. You understand multi-agent orchestration, tool/function calls, agent handoffs, and where they fail in integration — distinct from where they fail in answer quality.
- Strong Python; comfortable with async, streaming, and instrumentation.
- CI/CD discipline — GitHub Actions / GitLab CI / Jenkins / similar — and a track record of test suites that stay green and fast.
- Excellent written communication — release reports and go/no-go calls must be unambiguous to engineers and leadership