Exp- 5+ yrs
Location- Remote (Preferred candidates to be in bangalore)
Notice- Looking candidates with to be joining within 30 Days
Key Responsibilities:
Design, implement, and optimize scalable data pipelines using Databricks and Apache Spark.
-
Architect data lakes using Delta Lake, ensuring reliable and efficient data storage.
-
Manage metadata, security, and lineage through Unity Catalog for governance and compliance.
-
Ingest and process streaming data using Apache Kafka and real-time frameworks.
-
Collaborate with ML engineers and data scientists on LLM-based AI/GenAI project pipelines.
-
Apply CI/CD and DevOps practices to automate data workflows and deployments (e.g., with GitHub Actions, Jenkins, Terraform).
-
Optimize query performance and data transformations using advanced SQL.
-
Implement and uphold data governance, quality, and access control policies.
-
Support production data pipelines and respond to issues and performance bottlenecks.
-
-
Contribute to architectural decisions around data strategy and platform scalability.
Required Skills & Experience:
-
5+ years of experience in data engineering roles.
-
Proven expertise in Databricks, Delta Lake, and Apache Spark (PySpark preferred).
-
Deep understanding of Unity Catalog for fine-grained data governance and lineage tracking.
-
Proficiency in SQL for large-scale data manipulation and analysis.
-
Hands-on experience with Kafka for real-time data streaming.
-
Solid understanding of CI/CD, infrastructure automation, and DevOps principles.
-
Experience contributing to or supporting Generative AI / LLM projects with structured or unstructured data.
-
Familiarity with cloud platforms (AWS, Azure, or GCP) and data services.
-
Strong problem-solving, debugging, and system design skills.
-
Excellent communication and collaboration abilities in cross-functional teams.