Overview:
- Responsibilities:
-
Execute data analysis tasks guided by the architect and lead. (cluster/process log analysis, code profiling, metrics gathering).
-
Analyze Hive, Spark, Kafka, Hbase, spark streaming processes for identifying performance improvement opportunities
-
Recommended optimizations for data storage, Hbase, HDFS.
-
Assist with migration impact analysis and data transfer tasks (if applicable).
-
Qualifications:
-
5 to 8 years’ experience with Cloudera stack (Spark, NiFi, Hive, Hbase, Kafka, Spark Streaming).
-
Proven experience in doing performance improvement on hive, spark jobs.
-
Ability to evaluate existing jobs with critical mindset to find areas of improvement.
-
Strong Scala/Python programming skills for data engineering.
Team-oriented, able to work effectively under the guidance of senior team members