To be responsible for managing technology in complex projects ,providing technical guidance and ensuring successful delivery of solutions. Job Title: Speech AI Engineer Role summary We are looking for a senior hands-on expert who can take speech systems from raw audio to reliable production features. You will build and improve core speech capabilities such as ASR, TTS, voice conversion, and speech-to-speech workflows, and you will also own the engineering work that makes them fast, scalable, and measurable in the real world. This role is a strong fit if you enjoy the full stack of speech AI: signal processing intuition, modern deep learning, decoding and streaming constraints, and practical deployment trade-offs. What you will own 1) Speech modeling that ships • Build, train, and iterate on ASR models for real-world conditions such as conversational speech, accents, noise, and far-field audio, with strong offline and online evaluation discipline. • Develop and improve TTS systems that are natural, low-latency, and stable on speaker identity and prosody, with production-quality inference constraints. • Work on voice conversion and accent conversion when needed, preserving intelligibility, naturalness, and speaker identity in streaming settings. 2) Decoder and streaming engineering • Design and implement decoding stacks using proven libraries and patterns, including Kaldi and OpenFST, and features like custom vocabulary injection, language model rescoring, and beam search tuning. • Build streaming inference systems with strict latency budgets and predictable behavior at scale, including monitoring and continuous improvement loops. 3) Speech analysis and speech intelligence • Deliver speech analytics building blocks such as VAD, diarization, speaker recognition, and quality analytics that improve end-to-end product outcomes. • Design robust evaluation harnesses and datasets for real user scenarios, including domain adaptation and behavior tuning across use cases. 4) GenAI and LLM integration for voice experiences • Integrate speech components into LLM-based systems, including cascaded ASR plus LLM plus TTS pipelines, and drive joint optimization where it materially improves product quality. • Build or extend speech generation capabilities including voice cloning, controllable prosody, and modern generative architectures where relevant to the roadmap. 5) Production deployment and operational excellence • Own end-to-end delivery: prototyping, ablations, training, evaluation, optimization, deployment, and post-launch monitoring. • Partner closely with product and platform teams to integrate models into real-time systems and maintain reliability, uptime, and quality under production traffic. Required qualifications • 6+ years building production-grade speech or audio ML systems, or equivalent depth through research plus shipped production impact. • Strong programming ability in Python, plus comfort in C or C++ for performance-critical components. • Proven expertise in deep learning for speech (PyTorch or TensorFlow) and practical model training and serving. • Solid fundamentals in speech and audio, including signal processing concepts and real-world acoustic variability. • Experience deploying models into real-time or high-throughput systems, including evaluation, scalability, and production reliability. Strongly preferred • Hands-on experience with decoding toolchains and speech customization