Role Overview
We are seeking a highly skilled Senior AWS Glue Developer with deep expertise in PySpark, distributed data processing, and cloud-native ETL pipeline development. The ideal candidate will design, build, optimize, and maintain large-scale data ingestion and transformation pipelines on AWS, contributing to our enterprise data platform modernization and analytics initiatives.
Key Responsibilities
1. ETL Development & Data Engineering
- Design, develop, and optimize AWS Glue ETL jobs using PySpark, Glue Studio, and Glue Workflows.
- Build scalable batch and near–real-time ingestion pipelines using Glue, Lambda, and Step Functions.
- Transform data for analytical, reporting, machine learning, and Lakehouse use cases.
2. Data Lake / Lakehouse Architecture
- Develop pipelines targeting Amazon S3 Data Lake, Iceberg,
- Implement robust data quality, metadata, and governance layers (Glue Catalog, Lake Formation).
- Optimize storage using Parquet, compression, and columnar formats.
3. Performance Optimization
- Tune PySpark jobs for high performance (memory management, partition pruning, shuffle optimization).
- Optimize Glue job parameters (worker type, DPUs, job bookmarks, concurrency).
4. CI/CD & DevOps Integration
- Build automated deployments using GitHub Actions.
5. Cross-functional Collaboration
- Partner with Architects, Leads, Developers and Business teams to refine requirements.
- Translate functional specifications into technical ETL and orchestration solutions.
Required Skills & Experience
Core Technical Skills
- 8–12+ years’ experience as a Sr Data Engineer.
- Strong expertise in PySpark and distributed computing.
- Hands-on experience with:
- AWS Glue (Jobs, Workflows, Triggers, Crawlers)
- AWS Lambda
- AWS Step Functions
- Amazon S3
- AWS Athena
- Glue Catalog / Lake Formation
- Redshift
- DynamoDB
- Advanced SQL and optimization for big data workloads.
Big Data & Cloud
- Experience with Kafka,Flink (nice-to-have).
- Strong knowledge of ETL patterns, CDC frameworks, and event-driven pipelines.
- Understanding of Medallion architecture and Lakehouse principles.
Soft Skills
- Strong communication and documentation abilities.
- Ability to lead development tasks and mentor junior engineers.
- Ability to work in an agile, fast-paced environment.
Preferred Qualifications
- Experience with Iceberg /Redshift/Sqlserver/DynamoDB.
- Experience with Informatica cloud (nice-to-have).
- Exploring AI Tools for data integrations