Lead Data Scientist

IHX PRIVATE LIMITED
Bengaluru, Karnataka

Quick apply

Job details

1 day ago

Qualifications

TensorFlow
CI/CD
Image processing
Azure
Law
Engineering
Data structures
Master's degree
Microservices
SQL
AWS
Docker
Machine learning
Distributed systems
Continuous integration
Deep learning
Natural language processing
CPT coding
APIs
Predictive analytics
Data science
Metadata
Clustering
AI
Leadership
Python

Full job description

About IHX

IHX is building India’s most trusted health tech infrastructure platform—powering

real-time, consent-led claims and data exchange between insurers and 30,000+

hospitals across 1,200+ cities. With $1B+ in claims processed yearly, IHX sits at the

center of the next-gen health insurance stack.

About the Role

We are building the next-generation automated health-insurance claims processing platform, leveraging AI/ML, deep learning, NLP, OCR, and LLM-powered intelligence. As a Lead Data Scientist, you will drive the design, development, deployment, and optimization of AI models that power large-scale claims decisioning across multiple regions. This is a high-impact leadership role where you will work independently, set technical direction, mentor a diverse team, and ensure reliable production performance of mission-critical models.

Key Responsibilities

Model Development, Deployment & Production Support

Design, develop, train, and validate models used in automated health-claims processing.
Own the end-to-end machine learning pipelines including data ingestion, feature engineering, modeling, validation, deployment, and monitoring.
Monitor model performance, drift, SLAs, and stability in real-time production environments.
Lead root-cause analysis, bug resolution, and continuous improvement of production models.
Build state-of-the-art models including classical ML, deep learning, NLP, OCR, LLMs, transformers, and generative AI.
Implement scalable model serving and continuous training strategies using modern MLOps tools.

Build Efficient, Scalable & High-Accuracy Models

Optimize models for accuracy, latency, memory footprint, and infrastructure cost.
Implement model compression, distillation, and quantization when required to meet SLAs.
Ensure solutions perform reliably across heterogeneous real-world datasets and regions.

Implement End-to-End ML Pipelines

Architect and implement automated ML pipelines covering structured, unstructured, and document based data.
Build feature engineering, model training, validation, and retraining workflows.
Implement CI/CD for ML, model versioning, and automated retraining strategies.
Work closely with engineering teams to operationalize ML using MLOps best practices.

Expertise in a Wide Range of AI Techniques

Hands-on experience with classical ML models including tree-based models, linear models, clustering, and anomaly detection.
Experience with deep learning architectures such as CNNs, RNNs, and Transformers.
Strong background in NLP and LLM-based solutions for extraction, summarization, classification, and claim interpretation.
Experience building OCR pipelines for document parsing, form extraction, and image understanding.
Experience applying generative AI for reasoning, rule extraction, and claim scenario understanding.
Ability to evaluate and select the most appropriate technique for each problem.

Work Independently on High-Scale Business Use Cases

Own ML modules deployed across multiple geographies, regulations, and insurance ecosystems.
Ensure scalability and robustness for high-volume claims processing workloads.
Collaborate with product, engineering, and operations teams to translate business requirements into ML solutions.

Strong Technical Acumen

Deep understanding of data structures, machine learning algorithms, and modern AI architectures.
Proficiency in Python, ML frameworks such as PyTorch and TensorFlow, and cloud platforms including AWS, GCP, or Azure.
Familiarity with distributed systems, microservices, APIs, and containerized deployments.
Ability to conduct architecture reviews and guide engineering teams on ML integration.
Experience building scalable data pipelines and feature stores.
Define data quality standards, metadata tracking, and experiment management practices.
Lead by example with strong individual contributions on critical projects.
Write high-quality, production-ready Python code using frameworks such as PyTorch, Hugging Face, LangChain, or Ollama.
Conduct rigorous model validation, interpretability analysis, and bias detection.

Team Leadership, Mentoring & Collaboration

Lead, mentor, and inspire data scientists, ML engineers, and analysts.
Foster a culture of ownership, experimentation, innovation, and continuous learning.
Collaborate cross-functionally with product, engineering, quality, and operations teams.
Demonstrate empathy, flexibility, and leadership in fast-paced environments.

Required Qualifications

Engineering degree is mandatory.
8+ years of experience in data science or machine learning, with 3–5 years in a leadership role.
Proven experience building and deploying ML models in production at scale.
Strong foundation in statistics, machine learning fundamentals, optimization, and deep learning.
Expertise in NLP, transformers, LLM fine-tuning, embeddings, computer vision, OCR, time-series modeling, and predictive modeling.
Advanced proficiency in Python, SQL, ML frameworks, and cloud platforms.
Demonstrated success leading teams and delivering enterprise-scale AI systems.

Preferred Qualifications

Experience in health-insurance or health-claims processing ecosystems.
Understanding of regulatory and compliance constraints in healthcare data.
Knowledge of healthcare data standards such as HL7, FHIR, ICD, CPT, and SNOMED.
Experience with MLOps tools including MLflow, Kubeflow, Airflow, Docker, and CI/CD pipelines.

Quick apply

Jobseeker tools

Employer Tools

Browse

Stay Connected