We are looking for a Senior Data Engineer with deep expertise in real-time data streaming and distributed data processing to design, build, and scale next-generation data platforms. This role is critical in enabling event-driven architecture and real-time analytics for mission-critical banking systems, particularly across risk and compliance functions.
You will collaborate closely with data architects, platform engineers, and business stakeholders to deliver low-latency, high-throughput data pipelines that power advanced analytics and decision-making.
Key Responsibilities-
Design, develop, and maintain real-time streaming pipelines using Apache Kafka, PySpark, and Flink
-
Build scalable and fault-tolerant event-driven data architectures
-
Process high-volume streaming data with low latency and high reliability
-
Integrate data from multiple sources into centralized data platforms (Data Lake / Lakehouse)
-
Optimize data pipelines for performance, scalability, and cost efficiency
-
Ensure data quality, governance, and compliance aligned with banking standards
-
Work with cross-functional teams to translate business requirements into technical solutions
-
Monitor and troubleshoot streaming jobs and production pipelines
Required Skills & Experience-
5+ years of experience in Data Engineering
-
Strong hands-on experience with:
-
PySpark / Spark Streaming
-
Apache Kafka (Producers, Consumers, Kafka Streams)
-
Apache Flink or other real-time processing frameworks
-
Experience building real-time / near real-time data pipelines
-
Strong understanding of distributed systems and event-driven architecture
-
Proficiency in Python / Java / Scala
-
Experience with data lakes, ETL/ELT pipelines, and big data ecosystems
-
Knowledge of cloud platforms (AWS / Azure / GCP) is a plus
-
Familiarity with banking, risk, or compliance data systems is highly preferred
Preferred Qualifications-
Experience working in financial services or banking domain
-
Exposure to data governance, regulatory reporting, or compliance systems
-
Knowledge of CI/CD pipelines and DevOps practices for data platforms