-
Design, build, and maintain scalable ETL/ELT pipelines across batch and streaming workloads.
Implement and operate pipelines following a tiered data model (e.g., Bronze/Silver/Gold) to ensure clear data contracts, quality boundaries, and reusability.
Build pipelines that are observable by default, with strong metrics, logging, tracing, and alerting.
Implement data quality checks, validations, and automated tests at each data tier to ensure correctness, freshness, and reliability.
Apply strong system design principles to build fault-tolerant, scalable, and maintainable data systems.
Optimise pipeline performance, cost, and reliability through profiling, monitoring, and tuning.
Collaborate with platform, analytics, and ML teams to design well-modelled datasets for downstream consumers.
Participate in architecture and design reviews, contributing to data modelling, ingestion, and observability standards.
Troubleshoot production issues across pipelines and storage layers using logs, metrics, and traces.
Ensure data pipelines comply with security, governance, and compliance requirements.
Other duties as needed.
-
Strong experience building ETL/ELT pipelines for large-scale data platforms.
Good understanding of tiered data architectures (e.g., Bronze/Silver/Gold, medallion model) and how to apply them in production.
Hands-on experience with pipeline observability (metrics, logs, alerts, SLAs/SLOs).
Solid understanding of distributed systems and system design fundamentals.
Experience testing data pipelines, including data quality checks, regression testing, and failure scenarios.
Proficiency in one or more programming languages (e.g., Python, Java, Scala).
Experience with cloud platforms (AWS, Azure, or GCP).
Strong problem-solving and production debugging skills.