Key Responsibilities
- Design, build, and maintain scalable batch and real-time data pipelines using tools such as Apache Spark, Kafka, Flink, or Airflow
- Develop and optimize ETL/ELT workflows to ingest data from diverse sources including APIs, databases, event streams, and flat files
- Architect and manage cloud-based data infrastructure on AWS, GCP, or Azure (e.g., S3, BigQuery, Redshift, Databricks, Snowflake)
- Implement data quality monitoring, alerting, and observability frameworks to ensure pipeline reliability and SLA compliance
- Collaborate with data scientists and ML engineers to support model training, feature engineering, and inference pipelines
- Partner with analytics engineers to maintain and evolve data warehouse models (dbt, dimensional modeling)
- Define and enforce data governance standards including cataloging, lineage tracking, and access control policies
- Optimize pipeline performance through profiling, query tuning, partitioning strategies, and cost management
- Document pipeline architecture, data contracts, and runbooks for operational clarity Participate in on-call rotation and incident response for critical data infrastructure
- Required Qualifications
- 2-4 years of experience in data engineering, ETL development, or a related software engineering discipline
- Proficiency in Python and/or Scala for pipeline development; strong SQL skills across multiple dialects
- Hands-on experience with distributed processing frameworks such as Apache Spark, Beam, or Flink
- Experience with workflow orchestration tools such as Apache Airflow, Prefect, or Dagster
- Deep familiarity with cloud data platforms (AWS, GCP, or Azure) and managed services such as BigQuery, Redshift, or Synapse
- Experience designing and maintaining data warehouses or lakehouses (Snowflake, Databricks, Delta Lake, Iceberg)
- Strong understanding of data modeling concepts: normalization, star/snowflake schema, slowly changing dimensions
- Experience with streaming and event-driven architectures using Kafka, Kinesis, or Pub/Sub
- Familiarity with CI/CD practices and infrastructure-as-code tools (Terraform, Pulumi) for data platform deployments
- Excellent communication skills with the ability to translate business requirements into technical solutions.
Pay: ₹800,000.00 - ₹1,200,000.00 per year
Benefits:
- Flexible schedule
- Provident Fund
Work Location: Hybrid remote in Bengaluru, Karnataka (Bengaluru Urban District)