At Lilly, we unite caring with discovery to make life better for people around the world. We are a global healthcare leader headquartered in Indianapolis, Indiana. Our employees around the world work to discover and bring life-changing medicines to those who need them, improve the understanding and management of disease, and give back to our communities through philanthropy and volunteerism. We give our best effort to our work, and we put people first. We’re looking for people who are determined to make life better for people around the world.
Principal Architect-R4
Databricks AI Design & Architecture | Agentic AI Systems
As a Principal Architect for Databricks AI & Agentic Systems, you will own the end-to-end technical strategy and architecture of Lilly's AI data platform. You will design the systems that enable autonomous, multi-agent AI workflows—spanning data ingestion, semantic modeling, model orchestration, and real-time decision intelligence—while setting the architectural standards the engineering organization builds against.
This is a senior individual contributor and technical leadership role. You will be the definitive technical authority on Databricks architecture, agentic AI design patterns, and AI/ML platform engineering—directly influencing how Lilly uses AI to transform pharmaceutical R&D, commercial operations, and patient care.
KEY RESPONSIBILITIES
AI Architecture & Agentic System Design
- Architect multi-agent AI systems on Databricks, defining agent orchestration frameworks, tool-use patterns, memory/state management, and inter-agent communication protocols.
- Design agentic workflows that combine LLMs, retrieval-augmented generation (RAG), and structured data reasoning to support autonomous decision-making pipelines.
- Establish architecture patterns for agent-to-agent orchestration using frameworks such as LangGraph, AutoGen, CrewAI, or equivalent, integrated within the Databricks ecosystem.
- Define how AI agents interact with Unity Catalog assets—tables, models, volumes, and functions—as tool surfaces for autonomous data retrieval and action execution.
- Architect real-time and near-real-time agentic pipelines leveraging Databricks Structured Streaming, Delta Live Tables, and event-driven triggers.
Databricks Platform Architecture
- Own the Databricks Lakehouse architecture end-to-end: catalog/schema design, medallion architecture (raw → refined → curated → AI-ready), storage optimization, and multi-workspace governance.
- Define and enforce Unity Catalog standards for AI assets—model registries, feature tables, vector search indexes, and function-as-a-tool registrations—enabling governed AI consumption across the enterprise.
- Architect the semantic and metrics layer so business logic is expressed once, consistently consumed by BI tools, AI agents, and downstream APIs.
- Set Delta Lake standards: table properties, Z-ordering, partitioning strategies, liquid clustering, and cost-based optimization for petabyte-scale workloads.
- Design Databricks Model Serving architectures, including real-time inference endpoints, batch inference pipelines, and A/B model deployment patterns.
AI/ML Platform & LLM Integration
- Define the enterprise architecture for LLM integration on Databricks—covering prompt management, context assembly, vector stores (Databricks Vector Search), embedding pipelines, and LLM fine-tuning workflows.
- Architect MLflow-based model lifecycle management: experiment tracking, model versioning, champion/challenger patterns, and governance-compliant model promotion.
- Design feature engineering and feature serving architectures using Databricks Feature Store, ensuring AI agents have low-latency access to curated, consistent features.
- Evaluate and adopt emerging Databricks AI capabilities—Mosaic AI, AI/BI Genie, DBRX, and Databricks Apps—as first-class architectural components.
- Define patterns for responsible AI: model monitoring, drift detection, explainability, audit trails, and bias mitigation within regulated pharmaceutical contexts.
Governance, Security & Data Quality
- Define and enforce data governance, access control, lineage, and auditing across all Databricks workspaces using Unity Catalog and integration with enterprise identity providers.
- Architect data quality frameworks (Great Expectations, Databricks DQ rules, or equivalent) embedded into AI pipelines to ensure model inputs meet reliability standards.
- Define security architecture for AI workloads: network isolation, secret management, encryption-at-rest/in-transit, IP restriction, and compliance with GxP and GDPR requirements.
- Own data sharing architecture—Delta Sharing, external connectors, API gateway patterns—ensuring downstream AI consumers receive governed, versioned data surfaces.
Technical Leadership & Standards
- Own architectural decision records (ADRs), reference architectures, and platform blueprints that engineering teams build against independently.
- Lead architecture reviews, design critiques, and proof-of-concept evaluations for platform investments and AI product initiatives.
- Mentor senior data engineers and ML engineers, elevating architectural thinking, system design rigor, and AI engineering best practices across the organization.
- Influence and guide vendor partners (Databricks, cloud providers, ISVs) on solution direction, ensuring alignment with Lilly's technical and data governance strategy.
- Represent Tech@Lilly in external architecture forums, Databricks advisory councils, and industry working groups.
HOW YOU WILL SUCCEED
- A production-grade agentic AI platform on Databricks that powers autonomous workflows across commercial, R&D, and patient services domains.
- Architecture standards clear and comprehensive enough that distributed engineering teams execute with confidence within well-defined boundaries.
- Measurable platform outcomes: query performance benchmarks met, AI agent latency SLOs achieved, data quality scores at or above targets.
- Governance posture that passes internal audit and supports regulatory submissions in validated GxP environments.
- Sustained adoption of AI best practices—evidenced by engineers citing and following the architectural standards you define.
- Recognized as the go-to technical authority by both engineering and senior business stakeholders on Databricks and AI architecture decisions.
BASIC QUALIFICATIONS
- Bachelor's degree in Computer Science, Data Engineering, Artificial Intelligence, or equivalent; advanced degree preferred.
- 10+ years of experience in data engineering, data platform architecture, or AI/ML engineering.
- 4+ years of hands-on, production-grade Databricks experience: Unity Catalog, Delta Lake, Model Serving, Workflows, Databricks SQL, cluster lifecycle management, and workspace administration.
- 5+ years designing and delivering AI/ML systems at scale—covering model training, feature engineering, model serving, and monitoring.
- Demonstrated experience architecting agentic AI systems: multi-agent orchestration, tool-use patterns, LLM integration, RAG pipelines, and autonomous workflow design.
- Expert-level SQL and PySpark; strong Python for AI/ML pipeline development.
- Deep knowledge of data modeling, medallion architecture, Delta Lake internals, and data lakehouse design patterns.
- Proven track record of translating complex business requirements into scalable, governed, production-ready architectures.
- Experience leading architecture at the principal or distinguished level—setting standards, mentoring engineers, and influencing technical direction organization-wide.
PREFERRED QUALIFICATIONS
- Databricks Certified Data Engineer (Professional) and/or Databricks Certified Machine Learning Professional.
- Experience with agentic orchestration frameworks:
- Hands-on experience with Mosaic AI, Databricks Apps, AI/BI Genie, and DBRX or similar open-weight LLMs.
- Domain expertise in pharmaceutical, healthcare, or regulated industries (GxP, GDPR, HIPAA)—including Customer Master, Product Master, HCP/HCO data, and commercial alignment data.
- Experience with MLOps and LLMOps practices: CI/CD for ML, model monitoring, prompt versioning, evaluation pipelines.
- Familiarity with infrastructure-as-code (Terraform, Databricks Asset Bundles) and GitOps for platform configuration.
- Experience with vector databases (Databricks Vector Search, Pinecone, Weaviate) and embedding model management.
- Understanding of responsible AI principles, model risk management, and AI governance frameworks in enterprise settings.
Lilly is dedicated to helping individuals with disabilities to actively engage in the workforce, ensuring equal opportunities when vying for positions. If you require accommodation to submit a resume for a position at Lilly, please complete the accommodation request form (https://careers.lilly.com/us/en/workplace-accommodation) for further assistance. Please note this is for individuals to request an accommodation as part of the application process and any other correspondence will not receive a response.
Lilly does not discriminate on the basis of age, race, color, religion, gender, sexual orientation, gender identity, gender expression, national origin, protected veteran status, disability or any other legally protected status.
#WeAreLilly