Key Responsibilities
Design and implement data pipelines using AWS (Glue, EMR, Lambda) or GCP (Dataflow, BigQuery, Cloud Functions)
Develop ETL/ELT workflows to ingest, transform, and load structured and unstructured data
Build and maintain data lakes and data warehouses using Redshift, Snowflake, or BigQuery
Optimize data storage and retrieval for performance and cost-efficiency
Ensure data quality, governance, and security across all stages of the pipeline
Collaborate with cross-functional teams to support analytics, machine learning, and reporting needs
Required Skills
Strong programming skills in Python, SQL, or Scala
Hands-on experience with cloud-native data services (e.g., AWS Glue, S3, Athena, Redshift; or GCP BigQuery, Dataflow, Pub/Sub)
Familiarity with data modeling, partitioning, and schema design
Experience with workflow orchestration tools like Apache Airflow or Cloud Composer
Proficiency in CI/CD practices, Git, and infrastructure-as-code (e.g., Terraform, CloudFormation)
Preferred Qualifications
Certifications in AWS Certified Data Analytics or Google Professional Data Engineer
Exposure to streaming platforms like Kafka or Pub/Sub
Understanding of data privacy regulations (e.g., GDPR, HIPAA)