Fulfillment by Amazon (FBA) enables sellers to scale their businesses globally by leveraging Amazon's world-class fulfillment network. The WW FBA Central Analytics team builds and operates scalable, enterprise-grade data infrastructure, tools, and analytics solutions that power the WW FBA business. We partner across global product, program, and operations teams to unify diverse datasets, deliver self-service analytics, and develop next-generation capabilities using LLMs to unlock insights.
We are building a GenAI-powered insights assistant and data infrastructure that enables leaders to query complex FBA data using natural language and receive accurate, contextual answers in seconds. This initiative spans multiple business domains and requires a robust, scalable data platform that delivers fresh, validated, and well-documented data at enterprise scale.
We are seeking a Data Engineer II to own and scale the data platform powering this project. You will design, build, and operate high-reliability ETL pipelines across multiple FBA business domains, drive the DBT migration strategy, and establish monitoring and data quality frameworks. You will partner with Data Engineers, Business Analysts, and SMEs to ensure the data foundation meets strict accuracy, freshness, and documentation standards required for AI-driven insights.
Key job responsibilities
- Design and build scalable ETL pipelines in Spark/PySpark to ingest, transform, and load FBA metrics across multiple business domains into the Data Lakehouse.
- Own the DBT migration strategy. Architect the dbt project structure, define semantic models, and migrate existing pipelines from legacy orchestration to dbt + MWAA/Airflow.
- Build aggregate tables at daily, weekly, monthly, quarterly, and yearly grains from source tables using Maestro and dbt. Ensure correct business logic alignment with WBR/MBR/QBR metrics.
- Implement data validation frameworks including automated pre-built queries to cross-validate data across multiple source systems (US 3P, EU, CNGS).
- Design and deploy monitoring and alerting systems for all data pipelines. Automate ticketing on job failures, SLA breach notifications, and data freshness checks.
- Define and enforce data quality contracts: schema evolution policies, null-rate thresholds, row-count variance alerts, and backfill integrity checks.
- Develop and maintain documentation for all table schemas, column descriptions, business definitions, and data lineage.
- Optimize table structures and query patterns for fast, cost-efficient access by AI systems generating SQL from natural language.
- Orchestrate pipeline dependencies across domains and support regional expansion (EU, IN, JP) with minimal code duplication.
- Mentor junior Data Engineers on pipeline design patterns, code review standards, and operational best practices.
Our inclusive culture empowers Amazonians to deliver the best results for our customers. If you have a disability and need a workplace accommodation or adjustment during the application and hiring process, including support for the interview or onboarding process, please visit https://amazon.jobs/content/en/how-we-hire/accommodations for more information. If the country/region you’re applying in isn’t listed, please contact your Recruiting Partner.