Candidate Skill:
PySpark, Python, Spark SQL, Hadoop, ETL, Data Processing, Big Data, DataFrames, RDDs, Performance Tuning
Job Description:
We are looking for a skilled PySpark Developer to join our data engineering team. The ideal candidate should have strong experience in big data processing, distributed computing, and building scalable data pipelines using PySpark. Key Responsibilities: Develop and optimize data pipelines using PySpark Work with large datasets in distributed environments Design, build, and maintain ETL processes Collaborate with data analysts and data scientists for data requirements Ensure data quality, integrity, and performance tuning Work on Hadoop ecosystem tools and cloud platforms if required Required Skills: Strong experience in PySpark and Python Hands-on with Spark SQL, DataFrames, RDDs Good knowledge of Hadoop ecosystem (HDFS, Hive, etc.) Experience in ETL development and data processing Understanding of big data architecture and distributed systems Good to Have: Experience with AWS / Azure / GCP Knowledge of Kafka or streaming tools Familiarity with data warehousing concepts