Mission: Take full ownership of the core Lakehouse infrastructure, encompassing storage, compute, and developer platform layers that support all other operations.
- Design and maintain the Delta Lake storage layer, Photon compute engine, and Unity Catalog abstraction, serving over 1,000 developers across various retail sectors.
- Implement advanced optimization techniques including query plan tuning, cluster auto-scaling policies, Z-ordering strategies, and partitioning schemes for datasets with trillions of rows.
- Manage the internal developer platform by developing SDKs, CLI tools, templates, and enabling self-service onboarding to accelerate new teams' time-to-first-query.
- Lead the technical cleanup of Phase-1 migration challenges, including schema standardization, pipeline consolidation, and deduplication of source of record (SOR) systems across hundreds of sources.
- Oversee the Data Engineer transition cohort within this pillar, establishing engineering standards, enforcing code review processes, and defining career progression paths.
Mission: Industrialize machine learning by building infrastructure that efficiently moves models from experimentation notebooks to production at retail scale.
- Develop and maintain the end-to-end ML lifecycle leveraging MLflow, including experiment tracking, model registry, automated retraining, A/B testing, and canary deployments.
- Design the real-time inference architecture to deliver model serving with sub-100ms latency across recommendation, pricing, and demand forecasting applications.
- Construct the Agentic AI infrastructure comprising RAG pipelines, vector stores, fine-tuning workflows for Foundation Models (utilizing Mosaic AI), and agent orchestration frameworks.
- Establish governance for the Feature Store by standardizing feature definitions, enforcing freshness SLAs, lineage tracking, and promoting feature reuse across retail divisions.
- Ensure reliability of the ML platform through GPU/TPU cluster management, training job scheduling, cost attribution per model, and managing incident response for production model degradations.
Mission: Maintain platform stability, performance, and cost-efficiency—especially during critical periods.
- Ensure 99.99% platform uptime, providing leadership during critical events such as festive sales, store openings, and retail peak periods.
- Establish and run the FinOps practice focusing on DBU cost allocation by team and workload, implementing chargeback models, automating resource right-sizing, and delivering executive cost dashboards.
- Design and manage monitoring and observability systems covering pipeline health, query performance, cluster utilization, and data freshness SLAs across all six value streams.
- Lead capacity planning by forecasting compute and storage demands in line with retail seasonality (festive cycles, new store launches, category introductions) and provisioning resources accordingly in advance.
- Oversee incident management, develop runbooks, and conduct post-mortem evaluations for the Databricks platform, ensuring targets for mean time to recovery are met and continually improved.
Mission: Serve as the technical steward for India’s largest consumer dataset, ensuring its trustworthiness, compliance, and discoverability.
- Develop “Governance-as-Code” frameworks on Unity Catalog, incorporating automated access controls, data classification, PII masking, and audit trails to comply with DPDP Act requirements.
- Design and implement a data quality framework that includes automated profiling, anomaly detection, schema enforcement, and freshness monitoring across thousands of datasets.
- Manage the data catalog and discovery platform, providing metadata management, lineage visualization, business glossary, and search tools to support over 1,000 users.
- Build consent management infrastructure to monitor, enforce, and audit user consent signals throughout the comprehensive “Phygital” retail ecosystem (online and offline).
- Drive enterprise-wide data standards by defining naming conventions, rules for SOR deduplication, master data alignment, and data contract enforcement between producing and consuming teams.
- 14 to 20 years of professional experience in software engineering, data engineering, or ML infrastructure, including a minimum of 3 years leading a platform team of 5 or more engineers.
- 8 to 12 years of hands-on experience in building and scaling data or ML platforms such as Lakehouse architectures, Feature Stores, Streaming Engines, or MLOps pipelines.
- Strong technical expertise within the Databricks ecosystem or similar distributed data platforms (e.g., Spark, Presto/Trino, Flink, or Kafka at scale), with a strong preference for Databricks experience.
- Proven “builder-leader” approach: actively involved in code review, production debugging, and architectural decision-making without fully delegating technical responsibilities.
- Experience operating within large and complex technology organizations featuring inherited teams, cross-functional dependencies, and enterprise-grade compliance requirements.
- Bachelor’s or Master’s degree in Computer Science, Data Science, or a related discipline, or equivalent expertise acquired through industry experience and open-source contributions.
- Previous experience managing India-scale data platforms handling multi-billion events per day, petabyte-scale data warehouses, or real-time serving at over 10,000 queries per second.
- Hands-on experience with MLflow, Mosaic AI, or similar ML infrastructure platforms at production level—not limited to experimentation phases.
- Familiarity with retail or e-commerce data domains such as product catalogs, inventory management, order processing, customer behavior signals, or supply chain datasets.
- Demonstrated success in building internal tooling or developer platforms that have gained widespread organic adoption within large engineering organizations.
- Experience with FinOps practices including DBU/compute cost attribution, chargeback modeling, and enterprise-scale cloud cost optimization.
- Knowledge of Indian data privacy regulations (DPDP Act) or global frameworks (GDPR, CCPA) in the context of data platform governance.
This position reports directly to the VP & Head of Data & ML Platforms, who in turn reports to the Head of Enterprise IT, and ultimately to the CEO. You will collaborate as a peer with three other AVPs within the Data & ML Platforms group and work closely with more than 10 AI-ready Platform Engineers at Architect and Principal levels, alongside the transitioning Data & Platforms Engineers cohort.
The broader Enterprise IT division comprises five additional L2 groups: CISO/Cybersecurity, HR/Finance/Legal Platforms, SAP-Core, Systems & AI Architects, and CIO + Cloud & Infrastructure.
Data & ML Platform, Databricks, Platform Architecture
MLOps, System Architecture, Retail