TCS Walk-in at Chennai- Sholinganallur
Date- 16-May-26
JD
Key Responsibilities:
- Design, develop, and maintain data processing pipelines using PySpark
- Implement ETL workflows to ingest, transform, and load large datasets
- Optimize Spark jobs for performance, scalability, and reliability
- Work with structured and unstructured data from multiple sources (HDFS, S3, RDBMS, APIs, streaming sources)
- Collaborate with data engineers, analysts, and data scientists
- Ensure data quality, validation, and consistency
- Perform debugging, profiling, and tuning of Spark applications
- Write clean, reusable, and well-documented code
- Support production deployments and troubleshooting
- Follow best practices in data engineering, security, and governance
Required Skills & Qualifications:
- Strong experience with PySpark and Apache Spark
- Proficient in Python
- Experience with SQL and relational databases
- Solid understanding of Spark architecture (RDDs, DataFrames, Spark SQL, DAGs)
- Hands-on experience with big data ecosystems (Hadoop, Hive, HDFS)
- Experience with cloud platforms (AWS / Azure / GCP) and object storage
- Knowledge of data modeling, ETL concepts, and pipeline design
- Familiarity with version control tools (Git)
- Understanding of Linux/Unix environments