Senior Data Engineer

hirezy.ai
Hyderabad, Telangana

Quick apply

Job details

Full-time
₹8,00,000 - ₹25,00,000 a year
4 hours ago

Qualifications

Law
XML
Microsoft SQL Server
Selenium
Git
Master's degree
SQL
Pandas
Docker
REST
Natural language processing
GitHub
APIs
Agile
JSON
AI
Python
Analytics

Full job description

We are looking for a skilled Data Engineer, who can design and maintain robust data pipelines,

work with large-scale structured and semi-structured pharma datasets, and collaborate closely

with Data Scientists and Python Developers to deliver clean, validated, production-grade data

infrastructure.

Key Responsibilities

Pipeline Design & Development

Design and build scalable ETL/ELT pipelines for ingesting, transforming, and loading

pharmaceutical data from multiple sources

Process indication-level market forecast data from Excel workbooks into structured JSON

and DB-ready formats for frontend visualisation

Build and maintain data extraction pipelines from clinical trial registries, conference

abstracts, and drug databases

Implement multi-step validation workflows to ensure pipeline output accuracy and

completeness before downstream handoff

Data Modelling & Architecture

Design normalised relational schemas in SQL Server to support pharmaceutical analytics

use cases

Collaborate with Data Scientists on data preparation and feature engineering pipelines
Maintain data dictionaries, schema documentation, and lineage records for all active

pipelines

Contribute to architecture decisions on data storage, processing strategies, and API

integrations

Quality, Validation & Reliability

Own data quality across assigned pipelines — implement automated checks, alerting,

and audit trails

Investigate and resolve silent data errors including missing fields, incorrect labels,

zero-value nodes, and schema mismatches

Write thorough unit and integration tests for all pipeline components

Collaboration & Standards

Work within an agile delivery framework — participate in sprint planning, estimation,

reviews, and retrospectives

Follow and contribute to team-wide GitHub standards: branching strategy, PR reviews,

naming conventions, and documentation

Actively use AI coding tools (GitHub Copilot, Claude, etc.) as part of everyday

development

Produce clear technical documentation and handoff notes for every delivery

Required Skills & Experience

Technical Skills Tools & Platforms

Python (3+ years, production

pipelines)

SQL Server (primary)
SQL — complex queries, stored

procedures, optimisation

Git / GitHub (branching, PRs, code

review)

ETL/ELT design and implementation Pandas, SQLAlchemy, Pydantic
Data modelling and schema design Excel / openpyxl for source data

handling

REST API integration and data

extraction

Docker or containerised

environments (desirable)

Data validation and testing practices Selenium / web scraping tools

(desirable)

JSON / XML data processing Airflow or equivalent orchestration

(desirable)

Nice to Have

Experience working with pharmaceutical or life sciences data — clinical trials, market

forecasting, drug pipelines, or therapy area analytics

Familiarity with NLP pipelines or LLM-assisted data processing workflows
Knowledge of multi-agent AI architectures or experience building pipelines that feed

AI/ML models

Experience building Streamlit or lightweight data exploration tools

What We Expect From You

You think before you code — research, design, and plan before diving into

implementation

You take end-to-end ownership of your pipelines from requirement to optimisation
You are comfortable flagging issues early rather than discovering them post-handoff
You write code that your teammates can read, understand, and maintain
You embrace AI tools as a productivity multiplier, not a threat
You contribute to a team culture of quality, transparency, and continuous improvement

Pay: ₹800,000.00 - ₹2,500,000.00 per year

Work Location: In person

Quick apply

Jobseeker tools

Employer Tools

Browse

Stay Connected