Sr Data Architect

HoonarTek
Pune, Maharashtra

Apply Now

Job details

Full-time

Qualifications

CI/CD
Data modelling
Business intelligence
D3.js
Spark
Git
Google Cloud Platform
Research
Ab Initio
SQL
AWS
Machine learning
Terraform
Continuous integration
Unity
GitHub
Agile
Apache
Kafka
Metadata
AI
High availability
Identity & access management
Design patterns

Full job description

Pune

About Us

We empower enterprises globally through intelligent, creative, and insightful services for data integration, data analytics and data visualization.

Hoonartek is a leader in enterprise transformation, data engineering and an acknowledged world-class Ab Initio delivery partner.

Using centuries of cumulative experience, research and leadership, we help our clients eliminate the complexities & risk of legacy modernization and safely deliver big data hubs, operational data integration, business intelligence, risk & compliance solutions and traditional data warehouses & marts.

At Hoonartek, we work to ensure that our customers, partners and employees all benefit from our unstinting commitment to delivery, quality and value. Hoonartek is increasingly the choice for customers seeking a trusted partner of vision, value and integrity

How We Work?

Define, Design and Deliver (D3) is our in-house delivery philosophy. It’s culled from agile and rapid methodologies and focused on ‘just enough design’. We embrace this philosophy in everything we do, leading to numerous client success stories and indeed to our own success.

We embrace change, empowering and trusting our people and building long and valuable relationships with our employees, our customers and our partners. We work flexibly, even adopting traditional/waterfall methods where circumstances demand it. At Hoonartek, the focus is always on delivery and value.

Job Description

Data Architect

GCP × Databricks | Platform & Data Architecture | Permanent / Senior IC

Function

Data & Platform Engineering

Level

Principal / Staff Architect

Employment

Permanent, Full-Time

Location

Remote-Friendly / Hybrid

Experience

8+ Years (Architecture Focus)

GCP Expertise

Professional / Expert Level

Databricks

Certified Preferred

Reports To

VP / Head of Data Platform

1. Position Summary

We are looking for a Principal Data Architect with mastery across both Google Cloud Platform (GCP) and Databricks to lead the design and evolution of our enterprise data platform. This is a senior individual contributor role with broad influence — you will set the architectural direction for how data is ingested, stored, transformed, governed, and consumed across the organisation.

The ideal candidate brings deep, hands-on expertise in both ecosystems — not surface-level familiarity — and has a proven track record of designing production-grade, scalable data platforms that serve analytics, machine learning, and operational workloads. You will work at the intersection of strategy and engineering, translating business requirements into robust technical blueprints while mentoring engineering teams on their implementation.

2. Key Responsibilities

Platform Architecture & Design

Define the end-to-end architecture of the enterprise data platform spanning GCP (BigQuery, Dataproc, Cloud Composer, Pub/Sub) and Databricks (Unity Catalog, Delta Live Tables, MLflow)
Design and govern the lakehouse architecture — including bronze/silver/gold medallion layers, Delta Lake table design, and data lifecycle policies
Architect data ingestion patterns for batch, micro-batch, and real-time streaming workloads across both platforms
Evaluate and select tooling, frameworks, and services — balancing cost, performance, operational overhead, and strategic fit
Produce authoritative architecture artefacts: HLDs, LLDs, data flow diagrams, decision logs (ADRs), and reference architectures

Data Modelling & Governance

Design logical and physical data models — dimensional, normalised, and domain-oriented — appropriate to use case and access pattern
Establish and enforce data governance standards: cataloguing (Dataplex, Unity Catalog), lineage tracking, access control, and data classification
Define and implement data contracts between producing and consuming teams
Lead the adoption of data mesh or domain-oriented data ownership principles where appropriate

Engineering Enablement & Standards

Set engineering standards for PySpark / SQL development, pipeline design patterns, and testing practices across the data engineering function
Define CI/CD practices for data pipelines — including environment promotion, schema change management, and automated testing gates
Champion infrastructure-as-code (Terraform) for reproducible GCP and Databricks environment provisioning
Establish observability standards — data quality monitoring, pipeline SLAs, alerting, and incident response runbooks

Stakeholder & Cross-Functional Leadership

Partner with data engineering, data science, analytics, and product teams to translate requirements into actionable architectural decisions
Engage with GCP and Databricks account and technical teams to leverage roadmap features and managed support
Provide technical oversight and architectural review on major initiatives, ensuring alignment to the target state platform
Mentor senior engineers, conduct design reviews, and elevate architectural thinking across the data organisation

3. Required Skills & Experience

Google Cloud Platform — Expert Level

Deep hands-on experience designing production workloads on GCP data services: BigQuery (partitioning, clustering, BI Engine, materialized views), Dataproc, Cloud Composer (Airflow), Pub/Sub, Dataflow, and GCS
Expert understanding of GCP IAM, VPC Service Controls, and security architecture for data platforms
Experience designing multi-region, HA data architectures on GCP with DR considerations
Proficiency with GCP cost optimisation strategies — slot reservations, storage tiers, autoscaling, and committed use discounts
Familiarity with Vertex AI and its integration with the GCP data ecosystem for ML workloads

Databricks — Expert Level

Mastery of the Databricks Lakehouse Platform: Unity Catalog, Delta Lake internals, Delta Live Tables (DLT), and Photon engine
Deep experience designing Databricks workspace architecture — cluster policies, job compute, SQL warehouses, and access tiers
Expertise in Delta Lake optimisation: Z-ordering, OPTIMIZE, VACUUM, liquid clustering, and change data feed
Hands-on experience with Databricks MLflow for experiment tracking and model registry in production
Proficiency with Databricks Asset Bundles (DABs) or Terraform provider for IaC-based workspace management

Core Data Architecture

8+ years in data engineering or data architecture roles, with at least 3 years in a dedicated architecture capacity
Strong data modelling skills — dimensional modelling (Kimball), Data Vault 2.0, and domain-driven design applied to data
Expert-level SQL and PySpark — ability to review, advise on, and benchmark complex transformations
Proven experience designing real-time and streaming architectures using Kafka, Pub/Sub, or Kinesis alongside Spark Structured Streaming
Deep understanding of data quality frameworks: Great Expectations, dbt tests, Soda, or equivalent
Experience with metadata management and data cataloguing tools: Google Dataplex, Unity Catalog, Apache Atlas, or Collibra

Certifications (Preferred)

Google Cloud Professional Data Engineer or Professional Cloud Architect
Databricks Certified Data Engineer Professional or Databricks Certified Associate Developer for Apache Spark
Additional: dbt Certified Developer, AWS Solutions Architect (for polycloud exposure)

4. Technical Environment

Google Cloud Platform Stack

BigQuery
Dataproc
Cloud Composer
Pub/Sub
Dataflow
GCS
Dataplex
Vertex AI

Databricks Stack

Unity Catalog
Delta Lake
Delta Live Tables
MLflow
Photon Engine
SQL Warehouses
Databricks Asset Bundles
Workflows

Cross-Platform Toolchain

Apache Spark 3.x
PySpark / SQL
dbt Core / Cloud
Great Expectations
Terraform
Apache Kafka
Apache Airflow
GitHub Actions

5. Leadership & Behavioural Competencies

Beyond technical mastery, the successful candidate will demonstrate the following:

Architectural Thinking

Approaches problems from first principles; balances pragmatism with long-term platform health

Communication

Translates complex technical concepts clearly for engineering peers and non-technical executives alike

Decisiveness

Makes well-reasoned architectural decisions under ambiguity and documents them transparently via ADRs

Mentorship

Actively grows the architectural capability of the broader data engineering team through pairing and review

Vendor Acumen

Navigates GCP and Databricks roadmaps, partnerships, and commercial levers to the organisation's advantage

Bias for Quality

Champions data quality, observability, and operational excellence as non-negotiable engineering standards

Apply Now

Jobseeker tools

Employer Tools

Browse

Stay Connected