Core Responsibilities
-
Design and optimize batch/streaming data pipelines using Scala, Spark, and Kafka
-
Implement real-time tokenization/cleansing microservices in Java
-
Manage production workflows via Apache Airflow (batch scheduling)
-
Conduct root-cause analysis of data incidents using Spark/Dynatrace logs
-
Monitor EMR clusters and optimize performance via YARN/Dynatrace metrics
-
Ensure data security through HashiCorp Vault (Transform Secrets Engine)
-
Validate data integrity and configure alerting systems
-
Programming :Scala (Spark batch/streaming), Java (real-time microservices)
-
Big Data Systems: Apache Spark, EMR, HDFS, YARN resource management
-
Cloud & Storage :Amazon S3, EKS
-
Security: HashiCorp Vault, tokenization vs. encryption (FPE)
-
Orchestration :Apache Airflow (batch scheduling)
-
Operational Excellence Spark log analysis, Dynatrace monitoring, incident handling, data validation
Mandatory Competencies
-
Expertise in distributed data processing (Spark on EMR/Hadoop)
-
Proficiency in shell scripting and YARN job management
-
Ability to implement format-preserving encryption (tokenization solutions)
-
Experience with production troubleshooting (executor logs, metrics, RCA)
Insurance - Family
Term Insurance
PF
Paid Time Off - 20 days
Holidays - 10 days
Flexi timing
Competitive Salary
Diverse & Inclusive workspace