Your Responsibilities:
- Design, develop, and maintain scalable data processing pipelines using Hadoop and Spark.
- Implement data integration and ETL processes to ingest and transform large datasets.
- Collaborate with data scientists, analysts, business partners and other stakeholders to understand data requirements and deliver solutions.
- Optimize and tune Hadoop and Spark jobs for performance and efficiency.
- Manage and maintain data storage solutions, ensuring data integrity and security.
- Utilize GitHub for version control and collaboration on code development.
- Work with CDP (Cloudera Data Platform) to manage and deploy data applications.
- Integrate and manage data solutions on Cloud Azure, and Snowflake ensuring seamless data flow and accessibility.
- Monitor and troubleshoot data processing workflows, resolving issues promptly.
- Stay updated with the latest industry trends and technologies in big data and cloud computing.
- Explore and develop skills in newer technologies according to big data technology roadmap.
- Integrate and manage data solutions on Cloud Azure and on-premises infrastructure, ensuring seamless data flow and accessibility.
Qualifications:
- Strong experience with Apache Spark, Hadoop ecosystem, Hive, Iceberg, HBase, Apache Kafka, Spark Streaming or Flink, and Oozie (Airflow/NiFi good to have).
- Proficient in Scala, Java, and/or Python with solid coding and debugging skills.
- Applied experience with Solr for indexing and search workloads.
- Familiarity with Git/GitHub, GitHub Actions, Azure DevOps; good to have cloud experience (Azure/AWS).
- Knowledge of SQL, data modeling (nice to have), and experience working in UNIX or UNIX‑like environments.
- Experience with SCRUM or similar Agile frameworks; strong problem‑solving skills; able to produce clear technical documentation.
- Self‑starter, effective individual contributor, and good to have Oil & Gas domain knowledge.
Most Required Skillset for the job:
- Worked extensively on the Hadoop ecosystem including Apache Hadoop, Apache Spark, Apache Hive, Apache Kafka, HDFS, YARN, and Oozie for building scalable Big Data processing and ETL pipelines.
- Hands-on experience working with Cloudera Data Platform (CDP) for managing enterprise-scale Big Data workloads, distributed data processing, cloud-integrated analytics, data governance, and lakehouse architectures.
- Strong experience with Cloudera Distribution Hadoop (CDH) environment for deploying and managing Hadoop ecosystem components including Spark, Hive, HDFS, Impala, Kafka, and distributed ETL/data ingestion workflows.
- Experience designing and maintaining scalable data pipelines on CDP/CDH clusters with expertise in distributed computing, performance tuning, workflow orchestration, and large-scale data processing.
Pay: Up to ₹2,800,000.00 per year
Benefits:
- Health insurance
- Provident Fund
Experience:
- Hadoop Ecosystem: 6 years (Required)
- Cloudera Data Platform: 1 year (Required)
- Cloudera Distribution Hadoop : 1 year (Required)
Work Location: In person