Data Engineer – Databricks & PySpark
Location: Remote
Experience: 6–8 Years
Employment Type: Contract / Full-Time
Job Summary
We are seeking an experienced Data Engineer with strong expertise in Databricks and PySpark to design, develop, and optimize scalable data pipelines and cloud-based data platforms. The ideal candidate should have hands-on experience with big data technologies, cloud services, ETL/ELT processes, and modern data engineering practices.
Required Skills & Experience
- 6–8 years of experience in Data Engineering, Data Warehousing, and Big Data solutions.
- Strong hands-on experience with Databricks, PySpark, and Apache Spark.
- Expertise in building and maintaining large-scale ETL/ELT pipelines.
- Strong proficiency in Python and SQL.
- Experience with Delta Lake, Unity Catalog, and Databricks Workflows.
- Hands-on experience with cloud platforms such as Azure, AWS, or Google Cloud Platform (GCP).
- Experience with cloud storage solutions:
- Azure Data Lake Storage (ADLS)
- Amazon S3
- Google Cloud Storage
- Knowledge of data ingestion tools and frameworks.
- Experience with Azure Data Factory (ADF), AWS Glue, or similar ETL orchestration tools.
- Strong understanding of Data Lake, Data Warehouse, and Lakehouse Architecture.
- Experience with Apache Kafka, Event Hub, or other streaming technologies.
- Knowledge of CI/CD pipelines, Git, and DevOps practices.
- Familiarity with workflow orchestration tools such as Apache Airflow.
- Experience working with structured, semi-structured, and unstructured data.
- Understanding of data modeling, partitioning, performance tuning, and optimization techniques.
- Experience implementing data quality, governance, and security best practices.
Key Responsibilities
- Design, develop, and optimize scalable data pipelines using Databricks and PySpark.
- Build and maintain data ingestion, transformation, and processing frameworks.
- Develop batch and real-time data processing solutions.
- Implement Delta Lake and Lakehouse architecture best practices.
- Collaborate with Data Scientists, Analysts, and Business stakeholders to deliver data solutions.
- Optimize Spark jobs for performance, scalability, and cost efficiency.
- Create and maintain data models, data marts, and enterprise data warehouses.
- Implement monitoring, logging, and troubleshooting processes for data platforms.
- Ensure data quality, governance, security, and compliance standards are maintained.
- Participate in code reviews, architecture discussions, and technical design sessions.
Preferred Qualifications
- Experience with Azure Databricks is highly preferred.
- Knowledge of Snowflake, Redshift, BigQuery, or Synapse Analytics.
- Experience with Infrastructure as Code (Terraform).
- Databricks, Azure, AWS, or GCP certifications are a plus.
- Experience in Agile/Scrum environments.
Mandatory Technologies
Databricks, PySpark, Apache Spark, Python, SQL, Delta Lake, Data Lake, ETL/ELT, Cloud Platform (Azure/AWS/GCP), Airflow, Git, CI/CD, Kafka, Data Warehousing.
Pay: ₹80,000.00 - ₹110,000.00 per month
Experience:
- Big data: 5 years (Preferred)
- Data warehouse: 4 years (Preferred)
Work Location: Remote