Location: This position will be based in Bangalore/Mumbai, India
Role and Responsibilities:
- Design, develop, and maintain scalable data pipelines using PySpark and Apache Spark.
- Process and analyze large-scale structured and unstructured datasets in distributed environments.
- Responsible for building real-time analytics on cloud and edge devices
- Solve challenging data and architectural problems using cutting edge technology
- Cross functional collaboration with data scientists / data engineering / firmware controls teams
Skills and Experience:
- Strong Java/ Scala programming/debugging ability and clear design patterns understanding, Python is a bonus
- Understanding of Kafka/ Spark / Flink / Hadoop / HBase etc. internals (Hands on experience in one or more preferred)
- Implementing data wrangling, transformation and processing solutions, demonstrated experience of working with large datasets
- Experience in performance tuning and debugging Spark jobs
- Good understanding of distributed computing principles
- Knowhow of cloud computing platforms like AWS/GCP/Azure beneficial
- Exposure to data lakes and data warehousing concepts, SQL, NoSQL databases
- Working on REST API’s, gRPC are good to have skills
- Ability to adapt to new technology, concept, approaches, and environment faster
- Problem-solving and analytical skills
- Must have a learning attitude and improvement mindset
Qualifications:
- MTech/M.S with emphasis in computational or decision sciences preferred
- 3+ years of relevant experience