Job Description
Data Pipeline Development
Build and maintain ETL/ELT pipelines using Python
Ingest data from multiple sources (APIs, databases, files, streaming systems)
Optimize pipelines for performance and scalability
Clean, transform, and validate raw datasets
Handle structured and unstructured data
Use frameworks like:
Pandas
PySpark
Dask
Database & Data Warehousing
SQL (PostgreSQL, MySQL, SQL Server)
NoSQL (MongoDB, Cassandra)
Design schemas and optimize queries
Build data warehouses using:
Snowflake
Redshift
BigQuery
Big Data Technologies
Apache Spark
Hadoop
Process large-scale datasets efficiently
Workflow Orchestration
Apache Airflow
Cloud Platforms
Work on cloud environments:
AWS (S3, Glue, Lambda, EMR)
Azure (Data Factory, Synapse)
GCP (Dataflow, BigQuery)
Data Quality & Monitoring
Implement data validation checks
Monitor pipeline failures and fix bugs
Ensure data reliability and integrity