Responsibilities: • Design, build, and maintain scalable data pipelines and ETL/ELT workflows to ingest, transform, and process large volumes of structured and semi-structured data.
• Develop and optimize data models, tables, and transformations to support analytics, reporting, and downstream data consumption.
• Work with large datasets using SQL, PySpark, and modern data platforms such as Snowflake and Databricks to ensure efficient data processing.
• Build and manage data workflows using orchestration tools such as Apache Airflow, ensuring reliable and timely data delivery.
• Develop automation scripts using Shell Scripting and Python to support data pipeline execution, monitoring, and operational efficiency.
• Monitor, troubleshoot, and optimize data pipelines to improve performance, scalability, and reliability across the data ecosystem.
• Collaborate with data scientists, analysts, and business stakeholders to understand data requirements and enable data-driven insights
• Ensure adherence to data engineering best practices, including data quality checks, documentation, and pipeline governance.
Qualifications: Experience working with cloud platforms such as AWS, GCP, or Azure.
• Familiarity with data lakehouse architectures, data governance, and modern data platform practices.
• Proven ability to work with global stakeholders in cross-functional, matrixed environments