The Data Engineer is responsible for designing, building, and maintaining scalable data pipelines, data platforms, and integration solutions across cloud environments. This role focuses on transforming raw data into reliable, high-quality datasets to support analytics, reporting, and AI/ML use cases.
-
Design, develop, and maintain ETL/ELT pipelines for data ingestion, transformation, and loading
-
Build scalable data workflows using batch and real-time processing frameworks
-
Develop and optimize data pipelines for performance, reliability, and scalability
-
Handle structured and unstructured data across multiple sources
-
Work with cloud platforms (Azure / AWS / GCP) to build and manage data solutions
-
Utilize cloud-native services such as:
-
Data lakes, warehouses, and lakehouse platforms
-
Distributed compute (e.g., Spark, Databricks, Synapse)
-
Support deployment and management of data infrastructure and storage systems
-
Integrate data from multiple systems including APIs, databases, applications, and streaming sources
-
Implement transformation logic using SQL, PySpark, or other data processing tools
-
Ensure consistency and accuracy across data pipelines
-
Implement data validation, cleansing, and quality checks
-
Ensure compliance with data governance, privacy, and security policies (PII/PHI handling)
-
Maintain data lineage, metadata, and documentation
-
Monitor data pipelines and workflows for failures and performance issues
-
Implement logging, alerting, and troubleshooting mechanisms
-
Optimize pipelines for cost, speed, and resource utilization
-
Work closely with data architects, analysts, and business stakeholders to understand requirements
-
Support analytics, BI, and AI teams with clean and reliable datasets
-
Participate in code reviews, testing, and deployment processes
-
Document data flows, pipeline logic, and technical designs
-
Follow best practices for data modeling, schema design, and version control
-
Maintain reusable components and frameworks
-
3–8 years of experience in Data Engineering or related roles
-
Strong experience with:
-
ETL/ELT tools and frameworks
-
SQL and data modeling concepts
-
Python / PySpark / Scala (at least one)
-
Hands-on experience with:
-
Cloud platforms (Azure / AWS / GCP)
-
Big data tools (Spark, Databricks, Synapse, etc.)
-
Experience with data streaming tools (Kafka/Event Hubs) is a plus
-
Understanding of CI/CD and DevOps practices