Sr. Data Engineer

Enfycon India Pvt Ltd
Bangalore City, Bengaluru, Karnataka

Quick apply

Job details

Contractual / Temporary
6 hours ago

Qualifications

CI/CD
Performance tuning
Data modeling
Cloud infrastructure
Azure
Clinical research
Spark
AWS
Continuous integration
Natural language processing
Unity
ETL
AI
Communication skills

Full job description

Role: Data Engineer

Work Mode: Onsite

Experience: 8+ Years

Job Type :Contract

➡Role summary: You will combine clinical data expertise with strong data engineering and technical skills to generate well documented pipelines from source to curated data sets in common data models like CDISC SDTM. You will collaborate closely with clinical SMEs, data scientists, infrastructure, and other skilled data engineers. We are looking to expand this functionality to include Real World Data (from a broad range of registries). You will help extend our medallion Databricks pipelines (CDISC SDTM) to incorporate Real‑World Data (RWD) from registries and other sources, working with clinical experts and AI teams to combine rule‑based and automated mapping approaches (including OMOP interoperability).

➡ Key responsibilities  

Design, build and maintain production ETL pipelines in Databricks/Delta Lake to ingest RWD (registries, claims, EHR extracts) and transform into standard models.  Implement harmonization workflows to map incoming RWD to OMOP and to the internal CDISC SDTM canonical model; handle vocabulary mapping, units normalization and provenance.  Extend the medallion architecture (bronze/silver/gold) patterns with robust validation, lineage, partitioning and performance tuning.  Develop configurable, input driven transformation frameworks so clinical experts can drive mapping rules via config files and catalogs.  Integrate AI/automation components (e.g., model-assisted mapping, NLP for free text) with human in the loop review and confidence scoring.  Establish testing, CI/CD, monitoring and alerting for ETL jobs and automations; ensure reproducibility, versioning and governance. .

Required skills and qualifications

Proven experience designing and implementing ETL pipelines in Databricks / Spark and Delta Lake. Strong knowledge of OMOP CDM and experience mapping datasets to OMOP; familiarity with CDISC SDTM is a plus. Expertise in data modelling, partitioning, performance tuning, and best practices for large clinical/RWD datasets. Experience with vocabulary services and terminology mapping (OHDSI/Athena, UMLS, or similar). Experience integrating AI/NLP components into data pipelines (entity extraction, mapping suggestions) is desirable. Familiarity with testing frameworks for data (Great Expectations, Deequ), CI/CD, infrastructure as code, and orchestration tools (Databricks Jobs, Airflow). Good communication skills and experience working with domain experts to capture requirements. Preferred Prior experience in pharma or clinical research environments. Knowledge of data governance, privacy regulations and secure handling of patient data. Experience with Unity Catalog, Databricks Delta Sharing, and cloud infrastructure (Azure/AWS).

Work Location: Hybrid remote in Bangalore City, Bengaluru, Karnataka

Quick apply

Jobseeker tools

Employer Tools

Browse

Stay Connected