What you will be doing Utilize S3 buckets for storing large volumes of raw and processed data. Utilize Apache Iceberg (or similar) for managing and organizing data in the data lake. Design and develop data pipelines with EMR (or Glue) with PySpark. Implement and manage complex data workflows using Apache Airflow (MWAA) for orchestrating tasks. Create and maintain data catalogs using AWS Glue Catalog to organize metadata. AWS Athena for interactive querying. Familiarize with data modelling techniques to support analytics and reporting requirements (Star Schema, Snowflake, etc), as well as knowledge of the data journey stages within a datalake (raw, process, etc). Knowledge of professional software engineering practices & best practices for the full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations. Experience supporting and working with cross-functional teams in a dynamic environment. What we are looking for University degree in Computer Science or related. At least 3 years of experience within the Data Engineering area. Experience with PySpark. Experience with a major cloud provider (preferably, AWS). Solid experience in data modelling techniques. Knowledge of professional software engineering practices & best practices for the full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations. Experience supporting and working with cross-functional teams in a dynamic environment. Experience working in a SCRUM / Agile environment. Desirable: Automate resource provisioning and deployments on AWS with Terraform and GitLab CI/CD. AWS certifications related to Data Engineering.
Utilize S3 buckets for storing large volumes of raw and processed data. Utilize Apache Iceberg (or similar) for managing and organizing data in the data lake. Design and develop data pipelines with EMR (or Glue) with PySpark. Implement and manage complex data workflows using Apache Airflow (MWAA) for orchestrating tasks. Create and maintain data catalogs using AWS Glue Catalog to organize metadata. Use AWS Athena for interactive querying. Familiarize with data modelling techniques to support analytics and reporting requirements (Star Schema, Snowflake, etc). Understand the data journey stages within a datalake (raw, process, etc). Apply professional software engineering practices & best practices for the full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations. Support and work with cross-functional teams in a dynamic environment.
University degree in Computer Science or related At least 3 years of experience within the Data Engineering area Experience with PySpark Experience with a major cloud provider (preferably, AWS) Solid experience in data modelling techniques Knowledge of professional software engineering practices & best practices for the full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations Experience supporting and working with cross-functional teams in a dynamic environment Experience working in a SCRUM / Agile environment Desirable: Automate resource provisioning and deployments on AWS with Terraform and GitLab CI/CD AWS certifications related to Data Engineering
University degree in Computer Science or related At least 3 years of experience within the Data Engineering area Experience with PySpark Experience with a major cloud provider (preferably, AWS) Solid experience in data modelling techniques Knowledge of professional software engineering practices & best practices for the full software development life cycle, including coding standards, code reviews, source control management, build processes, testing, and operations Experience supporting and working with cross-functional teams in a dynamic environment Experience working in a SCRUM / Agile environment Desirable: Automate resource provisioning and deployments on AWS with Terraform and GitLab CI/CD AWS certifications related to Data Engineering