Big Data Engineer

Job details

Your Responsibilities:

Design, develop, and maintain scalable data processing pipelines using Hadoop and Spark.
Implement data integration and ETL processes to ingest and transform large datasets.
Collaborate with data scientists, analysts, business partners and other stakeholders to understand data requirements and deliver solutions.
Optimize and tune Hadoop and Spark jobs for performance and efficiency.
Manage and maintain data storage solutions, ensuring data integrity and security.
Utilize GitHub for version control and collaboration on code development.
Work with CDP (Cloudera Data Platform) to manage and deploy data applications.
Integrate and manage data solutions on Cloud Azure, and Snowflake ensuring seamless data flow and accessibility.
Monitor and troubleshoot data processing workflows, resolving issues promptly.
Stay updated with the latest industry trends and technologies in big data and cloud computing.
Explore and develop skills in newer technologies according to big data technology roadmap.
Integrate and manage data solutions on Cloud Azure and on-premises infrastructure, ensuring seamless data flow and accessibility.

Qualifications:

Strong experience with Apache Spark, Hadoop ecosystem, Hive, Iceberg, HBase, Apache Kafka, Spark Streaming or Flink, and Oozie (Airflow/NiFi good to have).
Proficient in Scala, Java, and/or Python with solid coding and debugging skills.
Applied experience with Solr for indexing and search workloads.
Familiarity with Git/GitHub, GitHub Actions, Azure DevOps; good to have cloud experience (Azure/AWS).
Knowledge of SQL, data modeling (nice to have), and experience working in UNIX or UNIX‑like environments.
Experience with SCRUM or similar Agile frameworks; strong problem‑solving skills; able to produce clear technical documentation.
Self‑starter, effective individual contributor, and good to have Oil & Gas domain knowledge.

Most Required Skillset for the job:

Worked extensively on the Hadoop ecosystem including Apache Hadoop, Apache Spark, Apache Hive, Apache Kafka, HDFS, YARN, and Oozie for building scalable Big Data processing and ETL pipelines.
Hands-on experience working with Cloudera Data Platform (CDP) for managing enterprise-scale Big Data workloads, distributed data processing, cloud-integrated analytics, data governance, and lakehouse architectures.
Strong experience with Cloudera Distribution Hadoop (CDH) environment for deploying and managing Hadoop ecosystem components including Spark, Hive, HDFS, Impala, Kafka, and distributed ETL/data ingestion workflows.
Experience designing and maintaining scalable data pipelines on CDP/CDH clusters with expertise in distributed computing, performance tuning, workflow orchestration, and large-scale data processing.

Pay: Up to ₹2,800,000.00 per year

Benefits:

Experience:

Work Location: In person