We are looking for a skilled Data Engineer, who can design and maintain robust data pipelines,
work with large-scale structured and semi-structured pharma datasets, and collaborate closely
with Data Scientists and Python Developers to deliver clean, validated, production-grade data
infrastructure.
Key Responsibilities
Pipeline Design & Development
- Design and build scalable ETL/ELT pipelines for ingesting, transforming, and loading
pharmaceutical data from multiple sources
- Process indication-level market forecast data from Excel workbooks into structured JSON
and DB-ready formats for frontend visualisation
- Build and maintain data extraction pipelines from clinical trial registries, conference
abstracts, and drug databases
- Implement multi-step validation workflows to ensure pipeline output accuracy and
completeness before downstream handoff
Data Modelling & Architecture
- Design normalised relational schemas in SQL Server to support pharmaceutical analytics
use cases
- Collaborate with Data Scientists on data preparation and feature engineering pipelines
- Maintain data dictionaries, schema documentation, and lineage records for all active
pipelines
- Contribute to architecture decisions on data storage, processing strategies, and API
integrations
Quality, Validation & Reliability
- Own data quality across assigned pipelines — implement automated checks, alerting,
and audit trails
- Investigate and resolve silent data errors including missing fields, incorrect labels,
zero-value nodes, and schema mismatches
- Write thorough unit and integration tests for all pipeline components
Collaboration & Standards
- Work within an agile delivery framework — participate in sprint planning, estimation,
reviews, and retrospectives
- Follow and contribute to team-wide GitHub standards: branching strategy, PR reviews,
naming conventions, and documentation
- Actively use AI coding tools (GitHub Copilot, Claude, etc.) as part of everyday
development
- Produce clear technical documentation and handoff notes for every delivery
Required Skills & Experience
Technical Skills Tools & Platforms
- Python (3+ years, production
pipelines)
- SQL Server (primary)
- SQL — complex queries, stored
procedures, optimisation
- Git / GitHub (branching, PRs, code
review)
- ETL/ELT design and implementation Pandas, SQLAlchemy, Pydantic
- Data modelling and schema design Excel / openpyxl for source data
handling
- REST API integration and data
extraction
environments (desirable)
- Data validation and testing practices Selenium / web scraping tools
(desirable)
- JSON / XML data processing Airflow or equivalent orchestration
(desirable)
Nice to Have
- Experience working with pharmaceutical or life sciences data — clinical trials, market
forecasting, drug pipelines, or therapy area analytics
- Familiarity with NLP pipelines or LLM-assisted data processing workflows
- Knowledge of multi-agent AI architectures or experience building pipelines that feed
AI/ML models
- Experience building Streamlit or lightweight data exploration tools
What We Expect From You
- You think before you code — research, design, and plan before diving into
implementation
- You take end-to-end ownership of your pipelines from requirement to optimisation
- You are comfortable flagging issues early rather than discovering them post-handoff
- You write code that your teammates can read, understand, and maintain
- You embrace AI tools as a productivity multiplier, not a threat
- You contribute to a team culture of quality, transparency, and continuous improvement
Pay: ₹800,000.00 - ₹2,500,000.00 per year
Work Location: In person