Data Engineers Expert

KPIT Technologies GmbH -
Pune, Maharashtra

Apply Now

Job details

Full-time

Qualifications

CI/CD
Data modelling
Azure
Spark
SQL
AWS
Scala
Continuous integration
Data analytics
Communication skills
Data warehouse
Python
Analytics

Full job description

We are hiring a Data Engineer (5+ Years Experience) for our heavy data analytics project.

Location: Pune
Experience: 5–10 Years
Key Skills: Spark, PySpark, Scala, Python, SQL, Databricks, Data Lake, DWH, Snowflake, Azure ADF/Synapse/ADLS

Job Description

We are looking for an experienced Data Engineer (5+ years) with strong hands‑on expertise in building high‑performance data pipelines for a heavy data analytics environment. The ideal candidate must excel in Spark (PySpark/Scala), complex SQL, data lake architectures, and data warehouse modeling. Experience with cloud data platforms such as Databricks, Azure, or Snowflake will be a strong advantage.

Key Responsibilities

1. Data Pipeline & ETL/ELT Development

Develop and optimize Spark (PySpark/Scala) pipelines
Ingest, transform, cleanse, and aggregate large datasets
Build scalable batch and near real‑time pipelines
Apply Delta Lake optimization, partitioning, caching, and performance tuning

2. Heavy Data Analytics

Write complex aggregation logic (window functions, grouping sets, analytical functions)
Understand KPIs, metrics, and analytical use cases
Translate business logic into technical transformations
Validate outputs against business requirements

3. Data Lake Engineering

Build multi‑layer data lake (Bronze/Silver/Gold)
Work with Parquet, Delta Lake, ORC and columnar formats
Implement schema evolution, metadata management, and auditing

4. Data Warehouse Engineering

Design dimensional models: Star Schema, Snowflake Schema
Build fact and dimension tables
Optimize table structures and partition strategies

5. Databricks (Preferred)

Develop notebooks and workflows in PySpark/Scala
Manage clusters, jobs, and Delta Live Tables
Apply best practices for cost and performance

6. SQL Engineering

Write and optimize complex SQL queries
Perform data profiling, validation, and analytical computations
Support dashboards and reporting layers

7. Cloud Data Platforms

Azure: ADF, Synapse, ADLS Gen2 (preferred)
Snowflake: Warehouses, Snowpipe, Streams/Tasks
Optional: Azure Functions

8. Data Quality & Documentation

Validate data transformations and business rules
Document data flows, transformation logic, and metadata
Collaborate with QA and analysts for data accuracy

Required Qualifications

5+ years of hands-on experience in Data Engineering
Strong programming: Spark, Scala, Python
Strong SQL (complex joins, analytical functions, aggregations)
Experience in Data Lake & Data Warehouse concepts
Spark performance tuning (Delta, shuffle tuning, partitioning)
Experience with at least one cloud ecosystem (Azure/AWS/GCP)

Preferred Skills

Experience with Databricks
Snowflake or modern cloud DWH
ADF/Synapse/Airflow/dbt
CI/CD for data pipelines
Large-scale analytics environment exposure

Soft Skills

Strong understanding of business logic behind analytics
Ability to convert metrics into technical transformations
Excellent debugging & problem‑solving capabilities
Good communication and cross‑functional collaboration

Spark, PySpark, Scala, Python, SQL (advanced), Data Lake, Data Warehouse, ETL/ELT pipelines, Delta Lake optimization, Azure/AWS/GCP cloud experience, Data modeling, Complex aggregations, Performance tuning
Databricks, Snowflake, ADF, Synapse, Airflow, dbt, CI/CD for data pipelines, Large‑scale analytics e

Apply Now