Role Summary
We are seeking a skilled Databricks Data Engineer to design, build, and optimize scalable data pipelines and data platforms using the Databricks Lakehouse architecture. The role involves working closely with data scientists, analysts, and business stakeholders to enable data-driven decision-making through robust, high-quality data solutions.
A Databricks Data Engineer primarily focuses on building and maintaining pipelines that transform raw data into usable insights using platforms like Apache Spark and Databricks.
Key Responsibilities
1. Data Pipeline Development
-
Design, develop, and maintain scalable batch and streaming data pipelines on Databricks
-
Implement end-to-end ingestion, transformation, and modeling (Bronze, Silver, Gold layers)
-
Build ETL/ELT workflows for structured and unstructured data
2. Data Processing & Transformation
-
Process large-scale data using Apache Spark (PySpark/Scala)
-
Perform data cleansing, transformation, and enrichment
-
Enable efficient data modeling for analytics and reporting
3. Performance Optimization
-
Optimize Spark jobs for performance (partitioning, joins, caching)
-
Troubleshoot and resolve pipeline performance bottlenecks
-
Tune clusters, workloads, and resource utilization
4. Data Quality & Governance
-
Ensure data quality, consistency, and reliability across pipelines
-
Implement validation checks, monitoring, and alerting mechanisms
-
Apply governance practices such as schema enforcement and versioning
5. Platform & Workflow Management
-
Develop and manage Databricks notebooks, jobs, and workflows
-
Orchestrate pipelines using tools like Databricks Workflows / Airflow
-
Monitor pipeline execution and ensure SLAs are met
6. Collaboration & Stakeholder Engagement
-
Work closely with data scientists, analysts, and architects to gather requirements
-
Translate business requirements into scalable data solutions
-
Support downstream reporting, analytics, and ML use cases
Required Skills & Competencies
Technical Skills
-
Strong experience with Databricks & Apache Spark
-
Proficiency in Python / PySpark / Scala / SQL
-
Experience with ETL/ELT and Data Warehousing concepts
-
Familiarity with Delta Lake, Lakehouse architecture
-
Experience with Cloud platforms (Azure / AWS / GCP)
-
Knowledge of big data technologies (Kafka, Hadoop, etc.)
Engineering Practices
-
Experience with CI/CD, Git, and SDLC practices
-
Exposure to pipeline orchestration tools (Airflow, ADF, etc.)
-
Understanding of data governance and security
Soft Skills
-
Strong problem-solving and analytical capabilities
-
Effective communication and stakeholder management
-
Ability to work in cross-functional teams
Qualifications
-
Bachelor’s / Master’s degree in Computer Science, Engineering, or related field
-
Typically 3+ years of experience in data engineering or big data environments
-
Certifications in Databricks or cloud platforms are a plus
Nice-to-Have
-
Experience with ML pipelines or feature engineering
-
Exposure to real-time streaming frameworks
-
Knowledge of Power BI / analytics tools (for enterprise setups)
Tools & Technologies
-
Databricks (Lakehouse Platform)
-
Apache Spark (PySpark / Scala)
-
SQL
-
Delta Lake
-
Azure Data Factory / Airflow
-
Cloud Storage (ADLS, S3, GCS)