Core Technologies • Apache Spark (Core, Structured Streaming) • PySpark • Databricks (AWS/Azure) • Advanced SQL DevOps & CI/CD • Jenkins • Git/GitHub/Bitbucket Programming • Python • SQL Cloud & Infrastructure • AWS (S3, EMR, EC2, IAM, CloudWatch) • Databricks Runtime & Cluster Management- Preferred Streaming & Integration • Apache Kafka • Snowflake integration (preferred) • Airflow (preferred) Preferred Qualification • Experience in financial services, payouts, or enterprise data platforms. • Hands-on experience in Delta Lake optimization and incremental processing strategies. • Experience with Snowflake data warehousing. • Strong understanding of distributed computing principles. Key Responsibilities • Design and develop scalable batch and near-real-time ETL/ELT pipelines using Snowflake (AWS) and Apache Spark (PySpark, Spark SQL, Structured Streaming). • Build structured streaming pipelines using Kafka and Spark Structured Streaming. • Design dimensional data models (Fact/Dimension, SCD Type 2). • Orchestrate pipelines using Databricks Workflows / Apache Airflow. • Integrate CI/CD pipelines using Jenkins, Git, Bitbucket/GitHub for automated deployment across DEV/UAT/PROD.
PySpark, Jenkins