Data Engineer_Expert
Job Description
- Design, build, and maintain Databricks data pipelines (ETL/ELT) for ingestion, transformation, and orchestration using Spark/Delta Lake/Databricks Workflows.
- Operationalize machine learning models by building inference pipelines that invoke models authored by data scientists (batch or real-time), ensuring consistency between training and inference environments.
- Ensure data reliability, quality, and observability through robust validation, monitoring, alerting, and automated recovery mechanisms.
- Collaborate closely with data scientists to productionize models, manage model deployment lifecycles, and optimize inference performance and cost.
- Implement best-practice DevOps/MLOps processes such as CI/CD for pipelines, model versioning, environment promotion, and infrastructure-as-code.
- Optimize performance and cost across compute clusters, jobs, and storage layers.
- Implement and manage the enterprise data catalog, including schema design, table ownership, lineage, governance, and documentation using Unity Catalog.
- Experience with some Databricks infrastructure.
- Experience with building BI dashboards and visualization.
- Experience with coding agents and best practices (spec-driven development, etc.).
Must Have / Nice to Have Skills Required:
• Databricks platform experience • Python development for data processing and ETL pipelines
- Unity Catalog knowledge
- AWS data services (S3, IAM, VPC, potentially Glue/Lambda)
- Data lake/lakehouse architecture patterns
- Dashboard building experience
Nice to Have:
- RESTful API design and development (Flask, FastAPI, or similar)
- Authentication/authorization patterns (OAuth, API keys, IAM roles)
- Query optimization and performance tuning
- PySpark optimization experience
- ML/AI pipeline experience
- Databricks AI/BI