Lead Generation Manager

Emergys -
Pune, Maharashtra

Apply Now

Job details

Qualifications

Spark
Apache Hive
Java
Distributed systems
Scala
Apache
Communication skills
Python

Full job description

Experience: 4-7 years

Location: Pune (India)

Primary Skills: Apache Spark (Java / Python / Scala), Apache Flink Hive, Impala

What You’ll Do

Design, build and optimize distributed data processing systems on CDP.
Architect batch and stream data pipelines using Apache Spark.
Build streaming pipelines leveraging Flink, Hive and modern table formats like Iceberg.
Develop high-performance data pipelines using Spark (Java/Python/Scala) on YARN-based clusters.
Ensure data quality, reliability, and performance tuning across large-scale distributed systems.
Develop and maintain ETL/ELT workflows orchestrated via Airflow.

Data Quality & Reliability:

Define and enforce data quality checks, lineage tracking, and SLA monitoring across pipelines.
Implement unit, integration, and end-to-end testing strategies for data pipelines.
Troubleshoot performance bottlenecks in Spark jobs, Flink topologies, and Hive queries – applying techniques such as partition pruning, broadcast joins, and predicate pushdown.

Collaboration & Governance

Partner with data architects, data scientists, and platform engineers to translate business requirements into robust data solutions.
Participate in design reviews, technical documentation, and knowledge sharing within the team.
Contribute to establishing engineering standards, coding guidelines, and best practices for the data engineering discipline.
Provide technical leadership across teams, unblock complex projects, and mentor junior engineers.
Translate product intent into technical plans, influence roadmaps with data-driven insights, and communicate trade-offs to executives and stakeholders.

Tech Stack

Framework: Apache Spark (Java / Python / Scala), Apache Flink
Query Engines: Hive, Impala
Storage & Formats: Apache Iceberg
Orchestration: Apache Airflow
Infrastructure: YARN-based clusters, CDP

What We’re Looking For

4-6 years of proven experience building distributed data systems with Apache Spark at scale.
Strong proficiency in Python / Java / Scala for data engineering.
Hands-on experience with streaming frameworks (Flink) and batch orchestration (Airflow).
Deep understanding of data quality practices, SLA monitoring, and pipeline observability.
Experience with modern table formats (Apache Iceberg preferred).
Strong communication skills – ability to present trade-offs clearly to technical and non-technical stakeholders.

Apply Now

Jobseeker tools

Employer Tools

Browse

Stay Connected