Data Scientist (Gen/Agentic AI solutions)

Holcim -
Navi Mumbai, Maharashtra

Apply Now

Job details

Qualifications

CI/CD
Statistics
Law
Computer Science
MCP
Kubernetes
Software deployment
Spark
NumPy
Git
Master's degree
Databases
SQL
Pandas
AWS
Math
Docker
Bachelor's degree
Machine learning
Continuous integration
SDKs
REST
APIs
Data science
Manufacturing
AI
gRPC
Graph databases
Python
Data Science

Full job description

Location: Navi Mumbai, MH, IN, 400708

Requisition ID: 17702

ABOUT HOLCIM

As the world’s global leader in innovative and sustainable building materials, Holcim is reinventing the way the world builds. Supported by a 45,000-strong global team spread across 44 countries and four industry segments (Cement, Aggregates, Ready-Mix Concrete and Solutions & Products), we are committed to shaping a greener, smarter and healthier world. It’s our ambition to lead the industry in reducing carbon emissions and accelerating the transition towards low-carbon constructions globally.

About The Role Qualifications:

BE / B. Tech in Computer Science, Engineering or relevant field
Graduate degree in Data Science or other quantitative field is preferred
Strong mathematics skills (e.g. statistics, algebra)
Certification in Gen/Agentic AI solutions
Certification in Platforms – Databricks,AWS is preferred

Experience:

8+ years of progressive experience in data science and machine learning, with a minimum of 3 years focused on Generative AI and LLM-based systems.
Demonstrated track record of delivering AI solutions at enterprise scale
Hands-on experience with full AI/ML lifecycle management: data engineering, feature stores, model training, evaluation, deployment, and monitoring.
Industry experience especially in Manufacturing Function in a Building Material Industry, Manufacturing, Process or Pharma is preferred.

Required skills:

Proficiency in Python; strong working knowledge of relevant libraries & frameworks: LangChain, LangGraph, HuggingFace Transformers, PyTorch, scikit-learn, Pandas, and NumPy.
Deep experience with LLM APIs (OpenAI, Anthropic, Google, Mistral) and open-source model deployment via Ollama, vLLM, or TGI.
Solid command of vector databases, semantic search, and knowledge graph technologies for enterprise RAG architectures.
Proficiency with MLOps tooling: MLflow, Weights & Biases, Kubeflow, or similar; experience with LLMOps tools such as LangSmith.
Strong SQL and experience with modern data platforms such as Databricks for AI-ready data preparation.
Understanding of software engineering best practices: version control (Git), containerization (Docker/Kubernetes), API design (REST/gRPC), and CI/CD pipelines.
Knowledge of how to benchmark GenAI models beyond simple accuracy (e.g., toxicity, bias, and reasoning depth).
Exposure to multi-modal AI systems incorporating vision, audio, or structured document understanding (PDFs, tables, charts).
Good understanding of the GENAI standards (MCP, A2A, A2UI etc.)

Key Responsibility:

Platform Prototyping: Design and implement core ML components, such as feature stores, model registries, and automated evaluation pipelines.
Standardization: Establish best practices for the ML lifecycle, from data labeling and experimentation to CI/CD for ML (MLOps).
Scalability: Optimize model inference and training workflows to handle high-throughput, low-latency requirements.
Internal Consulting: Act as a subject matter expert for product-facing data science teams, helping them leverage platform tools to solve complex business problems.
Tooling & Automation: Build internal libraries and SDKs that simplify the transition from a local research environment to a distributed production environment.
RAG Infrastructure: Design and optimize high-performance retrieval systems using vector databases (e.g., Pinecone, Weaviate) and advanced semantic search techniques.
LLM Evaluation Frameworks: Build automated "vibe-check" replacements. Develop rigorous evaluation pipelines using LLM-as-a-judge, G-Eval, or custom scoring rubrics to measure hallucination, faithfulness, and relevancy.
Agentic Orchestration: Develop and standardize the use of agentic frameworks (e.g., LangGraph, CrewAI) to allow product teams to build complex, multi-step AI workflows.
Model Lifecycle Management: Manage the transition between model providers (OpenAI, Anthropic, Google) and open-source alternatives (Llama 3+, Mistral) through unified abstraction layers.
Cost & Latency Optimization: Implement caching strategies (e.g., GPTCache), prompt compression, and token-usage monitoring to ensure the platform remains economically viable.
Guardrails & Safety: Integrate real-time content filtering and PII masking to ensure all LLM outputs comply with corporate security and ethical standards.

Result oriented and with a work ethic of delivering on-time and in scope

Did we spark your interest? Build your future with us and apply.

Apply Now

Jobseeker tools

Employer Tools

Browse

Stay Connected