Chennai, Tamil Nadu
Job Summary
The requirement is aligned toward an AWS native modern Data Engineering/Lakehouse implementation rather than a traditional ETL/MSBI-only profile.
The customer is looking for candidates who can help build and scale a native AWS Lakehouse platform handling very large-scale datasets, IoT/device feeds, streaming pipelines, unified data models, and analytical workloads. The expected core skill areas are around AWS-native data engineering stack such as:
- AWS Glue (ETL + Catalog)
- EMR / Spark-based distributed processing (Spark in AWS EMR/Glue is commonly written using PySpark, Python API for Apache Spark). So knowledge of Python for not application development but for data engineering/Spark pipelines.
- S3-based Lakehouse architecture
- Step Functions (pipeline orchestration)
- Redshift / Athena for analytics
- CDC/data ingestion patterns
- Iceberg/open table concepts
- Kinesis / Firehose for streaming ingestion
The major pain points customer is trying to solve are:
- Large-scale data migration and transformation
- Unifying multiple application datasets/entities into a common analytical model
- Handling billions of rows of IoT/device/solar/wind farm data efficiently
- Building scalable analytical and reporting architecture on AWS
- Incremental/Agile modernization using CDC and phased migration strategy
Traditional ETL/Data Warehouse experience is helpful as a foundation, but profiles should additionally demonstrate strong AWS-native cloud data engineering capabilities and distributed processing exposure to align with the actual customer expectation.
Key Responsibilities
null
#body.unify div.unify-button-container .unify-apply-now: focus, #body.unify div.unify-button-container .unify-apply-#body.unify div.unify-button-container .unify-apply-now: focus, #body.unify div.unify-button-container .unify-apply-