Key Responsibilities
- Design, develop, and maintain robust ETL workflows and data pipelines using PySpark.
- Integrate, transform, and load structured and unstructured data from multiple sources into target data stores (data lake/warehouse).
- Implement data quality checks, validation, and reconciliation to ensure consistency and integrity of delivered datasets.
- Implement automation, error handling, and performance tuning across ETL processes (partitioning, caching, optimized joins).
- Write and optimize SQL for data transformations, profiling, and performance optimization.
- Collaborate with BI/analytics teams to d eliver data models and curated datasets optimized for reporting and analytics.
- Document ETL logic, data flows, and technical specifications for maintainability and auditability.
- Support production pipelines by troubleshooting failures and implementing fixes/enhancements as required.
Desired Skills
Azure | Big Data | Python | MySql
Desired Candidate Profile
Qualifications : BACHELOR OF ENGINEERING