Job Title: Data Engineer (Ingestion & Pipelines)
- Function: Engineering – AI & Data
- Business Unit: REIL
- Location: Mumbai
- Experience: 5+ Years
- Employment Type: Full-Time
About REIL
Reliance Enterprise Intelligence Ltd (REIL) is an elite joint venture between Reliance Industries and Meta. By combining Reliance’s unparalleled scale and deep enterprise domain expertise with Meta’s world-class AI and technology capabilities, REIL is uniquely positioned to shape the future of enterprise intelligence.
About the Programme
REIL is developing a cutting-edge enterprise AI platform focused on financial compliance and intelligence.
Role Overview
We are seeking a high-caliber Data Engineer (Ingestion & Pipelines) to own the data foundation of our platform. Every model, AI output, and compliance decision our system makes depends entirely on data arriving reliably, completely, and on time.
In this role, you will design and build robust ingestion pipelines from complex enterprise source systems into the data platform, architect pipeline monitoring infrastructure, and work closely with internal IT and operations teams to establish an institutional-grade data ecosystem.
Key Responsibilities Data Discovery & Audit
- Map & Audit: Partner with internal IT and operations teams to conduct thorough data audits, mapping every required source system, assessing existing platform data, and identifying gaps.
- Documentation: Maintain comprehensive documentation of data sources, schemas, update frequencies, and known data quality constraints.
- Risk Management: Proactively identify and escalate data gaps and quality risks to the Solutions Architect.
Pipeline Design & Build
- End-to-End Ingestion: Design and build high-throughput ingestion pipelines from varied sources—including ERPs, government portals, supplier networks, and banking feeds—into the data platform.
- Architect Patterns: Implement both robust batch processing and low-latency real-time/streaming ingestion patterns.
- Fault Tolerance: Ensure all pipelines are idempotent, resumable, and built to handle source system failures gracefully without data loss or duplication.
Data Quality & Reliability
- Observability: Build proactive pipeline monitoring and alerting frameworks to catch and flag failures before they impact model training or live inference.
- Ingestion Guardrails: Implement inline data quality checks at the point of ingestion, including schema validation, completeness verification, and volume anomaly detection.
- Lineage: Maintain crystalline data lineage to ensure transparency on data provenance and freshness across the environment.
Collaboration & Handoff
- Cross-Functional Synergy: Interface closely with internal IT and automation teams to leverage institutional knowledge of legacy source systems.
- Downstream Delivery: Hand off clean, optimized, and well-documented datasets to ML and LLM Engineers for model training and knowledge base construction.
- Production Support: Assist the MLOps Engineer in ensuring production pipelines remain highly stable, observable, and performant post-deployment.
Qualifications & ExperienceEducation
- B.E. / B.Tech / M.Tech in Computer Science, Information Technology, or a closely related technical field.
Required Experience
- 5+ years of total data engineering experience, with at least 2 years dedicated to building and managing enterprise-scale production pipelines.
- Proven track record of extracting data from complex enterprise source systems (ERPs or equivalent).
- Hands-on experience architecting both batch and real-time/streaming ingestion pipelines.
Core Technical Skills
- Languages: Highly proficient in Python, PySpark, and advanced SQL.
- Data Platform: Hands-on production experience with Databricks and Delta Lake is mandatory.
- Pipeline Orchestration: Apache Airflow, Databricks Workflows, or equivalent tools.
- Streaming Ecosystems: Kafka, Spark Structured Streaming, or equivalent.
- ERP Integration: Experience extracting data from SAP or equivalent large-scale ERP systems is strongly preferred.
- API Management: Proficient in consuming REST APIs for government portals or third-party data feeds.
- Data Quality Frameworks: Experience with frameworks like Great Expectations or equivalent.
- Observability: Strong grasp of pipeline monitoring, alerting, and data lineage tooling.
Preferred Qualifications
- Deep familiarity with SAP data models.
- Prior experience navigating government API ecosystems (e.g., GSTN, ICEGATE, etc.).
- Experience building pipelines optimized for feeding ML model training workflows.
- Exposure to Unity Catalog or similar data cataloging and governance tooling.
- Prior background in fintech, compliance, or tax technology environments.
Pay: ₹295,092.76 - ₹2,000,000.00 per year
Benefits:
- Health insurance
- Paid sick time
- Provident Fund
Work Location: In person