As a Databricks Data Engineer, you will be responsible for designing, developing, and maintaining data solutions for data generation, collection, and processing in
Big Data environment using predominantly PySpark/Python. Your typical day will involve creating data pipelines, ensuring data quality, and implementing ETL
processes to migrate and deploy data across systems using PySpark.
Roles & Responsibilities:
- Collaborate closely with data scientists, data engineers, and business stakeholders to gather requirements and understand the business objectives driving
data pipeline development.
- Design, develop, and maintain robust, scalable high-performance Data Pipelines using Databricks.
- Leverage Databricks features such as Lakehouse and Delta Lake for efficient data storage and Spark for distributed processing
- Develop ETL/ELT pipeline using Databricks
- Monitor pipeline health, troubleshoot data issues
- Migrate on Prem Pyspark, SAS data pipeline and ML Models to Databricks
- Define and implement best practices in Databricks
· Evaluate new Databricks features and tools, helping the organization stay at the forefront of innovation in AI and Big DataProven expertise in implementing Lakehouse and Delta Lake using Databricks. · Strong PySpark and Python experience · Databricks Certified Data Engineer Professional Certification · Familiarity with ML Ops/LLM Ops and distributed systems. · Experience with Big Data platform like Cloudera Hadoop and Could platforms like AWS, GCP. · Solid understanding of system design patterns, scalability, observability, and performance tuning. · Strong analytical and problem-solving skills. · Passion for exploring and building with emerging technologies. Good to Have Skills: • AWS EKS Experience, Dockers and Containers