Engineer - Data Engineering
-
Data Pipeline Development: Design, develop, and maintain scalable data pipelines using PySpark to process large volumes of data from various sources.
-
Data Integration: Integrate data from multiple data sources and formats, ensuring high data quality and reliability.
-
Optimization: Optimize and tune data processing jobs for performance and cost-efficiency.
-
Collaboration: Work closely with data scientists, analysts, and other stakeholders to understand data requirements and deliver high-quality data solutions.
-
ETL Processes: Develop and maintain ETL processes to extract, transform, and load data into data warehouses and data lakes.
-
Data Quality: Implement data validation and monitoring processes to ensure data accuracy and consistency.
-
Documentation: Document data engineering processes, workflows, and best practices.
-
Troubleshooting: Identify, troubleshoot, and resolve data-related issues promptly.
-
Experience: 3+ years of experience in data engineering or a related field.
-
Education: Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field.
-
Proficiency in PySpark and Python.
-
Strong knowledge of big data technologies such as Hadoop, Hive, and Spark.
-
Experience with cloud platforms (e.g., AWS, Azure, GCP) and their data services.
-
Familiarity with data warehousing solutions (e.g., Amazon Redshift, Google BigQuery, Snowflake).
-
Knowledge of relational and NoSQL databases (e.g., MySQL, MongoDB, Cassandra).
-
Data Processing: Experience with ETL/ELT processes and data pipeline orchestration tools (e.g., Apache Airflow, Apache NiFi).
-
Problem-Solving: Strong analytical and problem-solving skills.
-
Communication: Excellent verbal and written communication skills, with the ability to explain complex technical concepts to non-technical stakeholders
PySpark, SQL, Python, Data Engineering