Key Responsibilities:
Data Pipeline Development:
Design, implement, and maintain efficient ETL/ELT pipelines to process large-scale data.
Automate data workflows and optimize pipelines using Python and Dataiku.
Data Integration and Collaboration:
Work closely with data analysts, scientists, and business stakeholders to understand requirements and ensure the delivery of accurate and timely data solutions.
Ensure seamless integration of third-party data sources and APIs into the data ecosystem.
Performance Optimization:
Optimize infrastructure, pipeline performance, and resource utilization for cost efficiency and scalability.
Develop robust monitoring and error-handling solutions to ensure data pipeline reliability.
Governance and Security:
Implement data quality frameworks, governance policies, and security best practices.
Ensure compliance with organizational and regulatory data standards.
Continuous Improvement:
Keep up with industry trends, explore emerging technologies, and recommend tools/processes to enhance the data engineering function.
Participate in code reviews, share best practices, and mentor junior team members when needed.
Required Skills & Qualifications:
Programming: Strong proficiency in Python with experience in writing clean and efficient code for data manipulations and pipelines.
Data Platforms: Hands-on expertise in Dataiku for building workflows and automating tasks.
Big Data Frameworks: Experience with Databricks to process and analyze large-scale datasets.
Cloud Data Warehousing: Proficiency in Snowflake database design, query optimization, and data warehouse management.
ETL/ELT: Expert knowledge of building ETL pipelines and designing scalable solutions.
SQL: Advanced SQL skills for querying, transforming, and optimizing relational databases.
Problem-Solving: Strong analytical and troubleshooting abilities, with experience solving complex data challenges.
Communication: Excellent written and verbal communication skills to interact with varied stakeholders effectively.