Job Overview
As a Senior Machine Learning Engineer, AI & ML — Data Collection, you will play a critical role in building and scaling the company’s Unified AI/ML Data Collection Platform, enabling standardized, reliable, and scalable machine learning capabilities across the organization. This role will focus on transforming existing AI/ML and LLM-driven data systems into a cohesive platform that supports data pipelines, model lifecycle management, evaluation frameworks, and production deployment.
This position requires deep hands-on expertise in machine learning engineering, LLM-based systems, ML platform development, and MLOps. You will work closely with ML engineers, product managers, researchers, and business stakeholders to deliver production-ready AI/ML systems aligned with broader business objectives and AI/ML strategy.
You will be deeply involved in the design, development, and operationalization of platform components, including data ingestion, feature management, model training and evaluation, scalable inference systems, and model observability capabilities.
You will help ensure that AI/ML systems are production-ready, observable, maintainable, and cost-efficient, with a strong emphasis on reliability, performance, governance, and developer productivity. You will leverage your expertise in areas such as large language models (LLM), retrieval-augmented generation (RAG), embeddings, vector databases, distributed systems, cloud-native architectures, and ML Operations (MLOps).
You will contribute to the end-to-end lifecycle of ML systems, from experimentation and prototyping to deployment, monitoring, optimization, and continuous improvement while mentoring engineers and promoting strong engineering practices across the team.
Team Overview
You will be part of a multidisciplinary team of ML engineers responsible for building and maintaining the Unified AI/ML Data Collection Platform. The team focuses on developing scalable systems that support data pipelines, model lifecycle management, LLM-based workflows, and evaluation frameworks, enabling downstream teams to build and deploy AI-driven data collection solutions.
Outline of Duties and Responsibilities
AI-Powered Data Collection Systems: Design and develop scalable AI-driven data collection and enrichment workflows across structured and unstructured data sources.
LLM & Generative AI Workflows: Build LLM-based capabilities including RAG systems, prompt orchestration, entity extraction, summarization, classification, and automated validation workflows.
Agentic Frameworks & Model Context Integration: Design and implement agentic workflows and model-to-tool integrations that connect AI models with internal tools, APIs, knowledge stores, data sources, and workflow systems.
Model Deployment & Lifecycle Management: Deploy, maintain, and optimize ML and LLM models in production, including model versioning, CI/CD, experiment tracking, model registry, rollout strategies, and rollback mechanisms.
Data Quality & Evaluation: Build frameworks for evaluating extraction quality, model performance, hallucination risks, grounding, consistency, latency, coverage, and overall data reliability.
Observability & Operational Excellence: Implement monitoring, logging, tracing, alerting, cost tracking, model performance monitoring, drift detection, and reliability dashboards for production AI/ML systems.
Scalable Platform Engineering: Design distributed, event-driven, and cloud-native systems using asynchronous processing, message queues, containerization, and orchestration patterns to support high-volume workloads.
Innovation & Continuous Improvement: Evaluate emerging AI/ML technologies, LLM frameworks, orchestration tools, vector databases, and model deployment approaches to improve automation capabilities and developer productivity.
Company Values: Model company values and contribute to a culture of innovation, accountability, collaboration, inclusion, and continuous improvement.
Experience, Skills and Qualifications
Bachelor’s or Master’s degree in Computer Science, Data Science, Mathematics, or a related technical field.
5+ years of experience in machine learning engineering or data science, with a focus on machine learning systems, ML platforms, or distributed systems.
Strong experience building production-grade ML systems, including model deployment and lifecycle management.
Hands-on experience with MLOps tools and practices, including CI/CD, model monitoring, and experiment tracking.
Strong programming skills in Python and SQL, or similar languages.
Experience with cloud platforms and containerization (e.g., AWS/GCP/Azure, Docker, Kubernetes).
Experience with LLM-based systems in production, including RAG pipelines, embeddings, and vector databases.
Solid understanding of distributed systems, scalability, and system design trade-offs.
Proven ability to solve complex technical challenges and deliver scalable solutions.
Excellent communication and collaboration skills, with experience working across global teams.
Experience working in fast-paced, data-driven environments.
Working Conditions
The job conditions for this position are in a standard office setting. Employees in this position use PC and phones on an ongoing basis throughout the day. Limited corporate travel may be required to remote offices or other business meetings and events.
Morningstar is an equal opportunity employer
Morningstar's hybrid work environment gives you the opportunity to collaborate in-person each week as we've found that we're at our best when we're purposely together on a regular basis. In most of our locations, our hybrid work model is four days in-office each week. A range of other benefits are also available to enhance flexibility as needs change. No matter where you are, you'll have tools and resources to engage meaningfully with your global colleagues.
I10_MstarIndiaPvtLtd Morningstar India Private Ltd. (Delhi) Legal Entity