We are hiring a Data Engineer (5+ Years Experience) for our heavy data analytics project.
Location: Pune
Experience: 5–10 Years
Key Skills: Spark, PySpark, Scala, Python, SQL, Databricks, Data Lake, DWH, Snowflake, Azure ADF/Synapse/ADLS
We are looking for an experienced Data Engineer (5+ years) with strong hands‑on expertise in building high‑performance data pipelines for a heavy data analytics environment. The ideal candidate must excel in Spark (PySpark/Scala), complex SQL, data lake architectures, and data warehouse modeling. Experience with cloud data platforms such as Databricks, Azure, or Snowflake will be a strong advantage.
- Develop and optimize Spark (PySpark/Scala) pipelines
- Ingest, transform, cleanse, and aggregate large datasets
- Build scalable batch and near real‑time pipelines
- Apply Delta Lake optimization, partitioning, caching, and performance tuning
- Write complex aggregation logic (window functions, grouping sets, analytical functions)
- Understand KPIs, metrics, and analytical use cases
- Translate business logic into technical transformations
- Validate outputs against business requirements
- Build multi‑layer data lake (Bronze/Silver/Gold)
- Work with Parquet, Delta Lake, ORC and columnar formats
- Implement schema evolution, metadata management, and auditing
- Design dimensional models: Star Schema, Snowflake Schema
- Build fact and dimension tables
- Optimize table structures and partition strategies
- Develop notebooks and workflows in PySpark/Scala
- Manage clusters, jobs, and Delta Live Tables
- Apply best practices for cost and performance
- Write and optimize complex SQL queries
- Perform data profiling, validation, and analytical computations
- Support dashboards and reporting layers
- Azure: ADF, Synapse, ADLS Gen2 (preferred)
- Snowflake: Warehouses, Snowpipe, Streams/Tasks
- Optional: Azure Functions
- Validate data transformations and business rules
- Document data flows, transformation logic, and metadata
- Collaborate with QA and analysts for data accuracy
- 5+ years of hands-on experience in Data Engineering
- Strong programming: Spark, Scala, Python
- Strong SQL (complex joins, analytical functions, aggregations)
- Experience in Data Lake & Data Warehouse concepts
- Spark performance tuning (Delta, shuffle tuning, partitioning)
- Experience with at least one cloud ecosystem (Azure/AWS/GCP)
- Experience with Databricks
- Snowflake or modern cloud DWH
- ADF/Synapse/Airflow/dbt
- CI/CD for data pipelines
- Large-scale analytics environment exposure
- Strong understanding of business logic behind analytics
- Ability to convert metrics into technical transformations
- Excellent debugging & problem‑solving capabilities
- Good communication and cross‑functional collaboration
Spark, PySpark, Scala, Python, SQL (advanced), Data Lake, Data Warehouse, ETL/ELT pipelines, Delta Lake optimization, Azure/AWS/GCP cloud experience, Data modeling, Complex aggregations, Performance tuning
Databricks, Snowflake, ADF, Synapse, Airflow, dbt, CI/CD for data pipelines, Large‑scale analytics e