Data Architect
GCP × Databricks | Platform & Data Architecture | Permanent / Senior IC
Function
Data & Platform Engineering
Level
Principal / Staff Architect
Employment
Permanent, Full-Time
Location
Remote-Friendly / Hybrid
Experience
8+ Years (Architecture Focus)
GCP Expertise
Professional / Expert Level
Databricks
Certified Preferred
Reports To
VP / Head of Data Platform
1. Position Summary
We are looking for a Principal Data Architect with mastery across both Google Cloud Platform (GCP) and Databricks to lead the design and evolution of our enterprise data platform. This is a senior individual contributor role with broad influence — you will set the architectural direction for how data is ingested, stored, transformed, governed, and consumed across the organisation.
The ideal candidate brings deep, hands-on expertise in both ecosystems — not surface-level familiarity — and has a proven track record of designing production-grade, scalable data platforms that serve analytics, machine learning, and operational workloads. You will work at the intersection of strategy and engineering, translating business requirements into robust technical blueprints while mentoring engineering teams on their implementation.
2. Key Responsibilities
Platform Architecture & Design
- Define the end-to-end architecture of the enterprise data platform spanning GCP (BigQuery, Dataproc, Cloud Composer, Pub/Sub) and Databricks (Unity Catalog, Delta Live Tables, MLflow)
- Design and govern the lakehouse architecture — including bronze/silver/gold medallion layers, Delta Lake table design, and data lifecycle policies
- Architect data ingestion patterns for batch, micro-batch, and real-time streaming workloads across both platforms
- Evaluate and select tooling, frameworks, and services — balancing cost, performance, operational overhead, and strategic fit
- Produce authoritative architecture artefacts: HLDs, LLDs, data flow diagrams, decision logs (ADRs), and reference architectures
Data Modelling & Governance
- Design logical and physical data models — dimensional, normalised, and domain-oriented — appropriate to use case and access pattern
- Establish and enforce data governance standards: cataloguing (Dataplex, Unity Catalog), lineage tracking, access control, and data classification
- Define and implement data contracts between producing and consuming teams
- Lead the adoption of data mesh or domain-oriented data ownership principles where appropriate
Engineering Enablement & Standards
- Set engineering standards for PySpark / SQL development, pipeline design patterns, and testing practices across the data engineering function
- Define CI/CD practices for data pipelines — including environment promotion, schema change management, and automated testing gates
- Champion infrastructure-as-code (Terraform) for reproducible GCP and Databricks environment provisioning
- Establish observability standards — data quality monitoring, pipeline SLAs, alerting, and incident response runbooks
Stakeholder & Cross-Functional Leadership
- Partner with data engineering, data science, analytics, and product teams to translate requirements into actionable architectural decisions
- Engage with GCP and Databricks account and technical teams to leverage roadmap features and managed support
- Provide technical oversight and architectural review on major initiatives, ensuring alignment to the target state platform
- Mentor senior engineers, conduct design reviews, and elevate architectural thinking across the data organisation
3. Required Skills & Experience
Google Cloud Platform — Expert Level
- Deep hands-on experience designing production workloads on GCP data services: BigQuery (partitioning, clustering, BI Engine, materialized views), Dataproc, Cloud Composer (Airflow), Pub/Sub, Dataflow, and GCS
- Expert understanding of GCP IAM, VPC Service Controls, and security architecture for data platforms
- Experience designing multi-region, HA data architectures on GCP with DR considerations
- Proficiency with GCP cost optimisation strategies — slot reservations, storage tiers, autoscaling, and committed use discounts
- Familiarity with Vertex AI and its integration with the GCP data ecosystem for ML workloads
Databricks — Expert Level
- Mastery of the Databricks Lakehouse Platform: Unity Catalog, Delta Lake internals, Delta Live Tables (DLT), and Photon engine
- Deep experience designing Databricks workspace architecture — cluster policies, job compute, SQL warehouses, and access tiers
- Expertise in Delta Lake optimisation: Z-ordering, OPTIMIZE, VACUUM, liquid clustering, and change data feed
- Hands-on experience with Databricks MLflow for experiment tracking and model registry in production
- Proficiency with Databricks Asset Bundles (DABs) or Terraform provider for IaC-based workspace management
Core Data Architecture
- 8+ years in data engineering or data architecture roles, with at least 3 years in a dedicated architecture capacity
- Strong data modelling skills — dimensional modelling (Kimball), Data Vault 2.0, and domain-driven design applied to data
- Expert-level SQL and PySpark — ability to review, advise on, and benchmark complex transformations
- Proven experience designing real-time and streaming architectures using Kafka, Pub/Sub, or Kinesis alongside Spark Structured Streaming
- Deep understanding of data quality frameworks: Great Expectations, dbt tests, Soda, or equivalent
- Experience with metadata management and data cataloguing tools: Google Dataplex, Unity Catalog, Apache Atlas, or Collibra
Certifications (Preferred)
- Google Cloud Professional Data Engineer or Professional Cloud Architect
- Databricks Certified Data Engineer Professional or Databricks Certified Associate Developer for Apache Spark
- Additional: dbt Certified Developer, AWS Solutions Architect (for polycloud exposure)
4. Technical Environment
Google Cloud Platform Stack
- BigQuery
- Dataproc
- Cloud Composer
- Pub/Sub
- Dataflow
- GCS
- Dataplex
- Vertex AI
Databricks Stack
- Unity Catalog
- Delta Lake
- Delta Live Tables
- MLflow
- Photon Engine
- SQL Warehouses
- Databricks Asset Bundles
- Workflows
Cross-Platform Toolchain
- Apache Spark 3.x
- PySpark / SQL
- dbt Core / Cloud
- Great Expectations
- Terraform
- Apache Kafka
- Apache Airflow
- GitHub Actions
5. Leadership & Behavioural Competencies
Beyond technical mastery, the successful candidate will demonstrate the following:
Architectural Thinking
Approaches problems from first principles; balances pragmatism with long-term platform health
Communication
Translates complex technical concepts clearly for engineering peers and non-technical executives alike
Decisiveness
Makes well-reasoned architectural decisions under ambiguity and documents them transparently via ADRs
Mentorship
Actively grows the architectural capability of the broader data engineering team through pairing and review
Vendor Acumen
Navigates GCP and Databricks roadmaps, partnerships, and commercial levers to the organisation's advantage
Bias for Quality
Champions data quality, observability, and operational excellence as non-negotiable engineering standards