Job Title: Databricks Data Engineer
Job Overview:
As an Databricks Data Engineer, you will play a pivotal role in designing, implementing, and optimizing data solutions using Azure Databricks. Your expertise will contribute to building robust data pipelines, ensuring data quality, and enhancing overall performance with an ability to document the technical aspects of the tasks. You'll collaborate with cross-functional teams to deliver high-quality solutions aligned with business requirements.
Responsibilities:
1. Develop Scalable Data Pipelines:
o Utilize Databricks and PySpark to design, develop, and maintain data processing pipelines.
o Implement ETL (Extract, Transform, Load) processes, ensuring efficient data extraction, transformation, and loading.
o Deep knowledge and hands-on of Databricks features and services - Feature Engineering, Lakeflow, Lakebase, Spark declarative pipeline
2. Data Quality Implementation:
o Establish data quality checks within Azure Databricks.
o Ensure data accuracy, consistency, and adherence to business rules.
o Monitor data quality metrics and address anomalies promptly.
3. Source System Integration:
o Integrate Azure Databricks with various source systems (databases, data lakes, APIs).
o Efficiently ingest data from diverse sources into Azure Databricks.
o Handle schema evolution and changes in source data.
4. Pyspark Coding and Optimization:
o Write efficient PySpark code for data transformations, aggregations, and analytics.
o Optimize Spark jobs for performance and resource utilization.
o Troubleshoot and debug PySpark scripts as needed.
5. Collaboration and Communication:
o Work closely with data scientists, engineers, and other stakeholders.
o Gather requirements, understand business needs, and deliver effective solutions.
o Participate in cross-functional teams to ensure successful project outcomes.
6 Delta Lake Expertise:
o Understand and utilize Delta Lake, which provides ACID transactions and time travel capabilities on top of data lakes.
o Implement Delta Lake tables for reliable data storage and versioning.
7 Stored Procedure Conversion in Databricks:
- Convert existing stored procedures (e.g., from SQL Server) into Databricks-compatible code.
- Optimize and enhance stored procedures for better performance within Databricks
Qualifications and Skills: