Data Engineer — Ingestion & Pipelines

Programming.com -
Navi Mumbai, Maharashtra

Quick apply

Job details

Permanent | Full-time
₹2,95,092.76 - ₹20,00,000.00 a year
2 days ago

Benefits

Provident Fund
Health insurance
Paid sick time

Qualifications

Law
Computer Science
SAP
Spark
ERP systems
Master's degree
SQL
Bachelor's degree
REST
Unity
APIs
B.E.
Construction
Apache
Kafka
Python
Information Technology

Full job description

Job Title: Data Engineer (Ingestion & Pipelines)

Function: Engineering – AI & Data
Business Unit: REIL
Location: Mumbai
Experience: 5+ Years
Employment Type: Full-Time

About REIL

Reliance Enterprise Intelligence Ltd (REIL) is an elite joint venture between Reliance Industries and Meta. By combining Reliance’s unparalleled scale and deep enterprise domain expertise with Meta’s world-class AI and technology capabilities, REIL is uniquely positioned to shape the future of enterprise intelligence.

About the Programme

REIL is developing a cutting-edge enterprise AI platform focused on financial compliance and intelligence.

Role Overview

We are seeking a high-caliber Data Engineer (Ingestion & Pipelines) to own the data foundation of our platform. Every model, AI output, and compliance decision our system makes depends entirely on data arriving reliably, completely, and on time.

In this role, you will design and build robust ingestion pipelines from complex enterprise source systems into the data platform, architect pipeline monitoring infrastructure, and work closely with internal IT and operations teams to establish an institutional-grade data ecosystem.

Key Responsibilities Data Discovery & Audit

Map & Audit: Partner with internal IT and operations teams to conduct thorough data audits, mapping every required source system, assessing existing platform data, and identifying gaps.
Documentation: Maintain comprehensive documentation of data sources, schemas, update frequencies, and known data quality constraints.
Risk Management: Proactively identify and escalate data gaps and quality risks to the Solutions Architect.

Pipeline Design & Build

End-to-End Ingestion: Design and build high-throughput ingestion pipelines from varied sources—including ERPs, government portals, supplier networks, and banking feeds—into the data platform.
Architect Patterns: Implement both robust batch processing and low-latency real-time/streaming ingestion patterns.
Fault Tolerance: Ensure all pipelines are idempotent, resumable, and built to handle source system failures gracefully without data loss or duplication.

Data Quality & Reliability

Observability: Build proactive pipeline monitoring and alerting frameworks to catch and flag failures before they impact model training or live inference.
Ingestion Guardrails: Implement inline data quality checks at the point of ingestion, including schema validation, completeness verification, and volume anomaly detection.
Lineage: Maintain crystalline data lineage to ensure transparency on data provenance and freshness across the environment.

Collaboration & Handoff

Cross-Functional Synergy: Interface closely with internal IT and automation teams to leverage institutional knowledge of legacy source systems.
Downstream Delivery: Hand off clean, optimized, and well-documented datasets to ML and LLM Engineers for model training and knowledge base construction.
Production Support: Assist the MLOps Engineer in ensuring production pipelines remain highly stable, observable, and performant post-deployment.

Qualifications & ExperienceEducation

B.E. / B.Tech / M.Tech in Computer Science, Information Technology, or a closely related technical field.

Required Experience

5+ years of total data engineering experience, with at least 2 years dedicated to building and managing enterprise-scale production pipelines.
Proven track record of extracting data from complex enterprise source systems (ERPs or equivalent).
Hands-on experience architecting both batch and real-time/streaming ingestion pipelines.

Core Technical Skills

Languages: Highly proficient in Python, PySpark, and advanced SQL.
Data Platform: Hands-on production experience with Databricks and Delta Lake is mandatory.
Pipeline Orchestration: Apache Airflow, Databricks Workflows, or equivalent tools.
Streaming Ecosystems: Kafka, Spark Structured Streaming, or equivalent.
ERP Integration: Experience extracting data from SAP or equivalent large-scale ERP systems is strongly preferred.
API Management: Proficient in consuming REST APIs for government portals or third-party data feeds.
Data Quality Frameworks: Experience with frameworks like Great Expectations or equivalent.
Observability: Strong grasp of pipeline monitoring, alerting, and data lineage tooling.

Preferred Qualifications

Deep familiarity with SAP data models.
Prior experience navigating government API ecosystems (e.g., GSTN, ICEGATE, etc.).
Experience building pipelines optimized for feeding ML model training workflows.
Exposure to Unity Catalog or similar data cataloging and governance tooling.
Prior background in fintech, compliance, or tax technology environments.

Pay: ₹295,092.76 - ₹2,000,000.00 per year

Benefits:

Health insurance
Paid sick time
Provident Fund

Work Location: In person

Quick apply

Jobseeker tools

Employer Tools

Browse

Stay Connected