About US:-
We turn customer challenges into growth opportunities. Material is a global strategy partner to the world’s most recognizable brands and innovative companies. Our people around the globe thrive by helping organizations design and deliver rewarding customer experiences.
We use deep human insights, design innovation and data to create experiences powered by modern technology. Our approaches speed engagement and growth for the companies we work with and transform relationships between businesses and the people they serve. Srijan, a Material company, is a renowned global digital engineering firm with a reputation for solving complex technology problems using their deep technology expertise and leveraging strategic partnerships with top-tier technology partners. Be a part of an Awesome Tribe
Role- Senior / Lead Data Engineer – Databricks, PySpark & Applied AI/LLM Engineering
We are seeking a highly skilled Senior/Lead Data Engineer with deep expertise in Databricks, Apache Spark, PySpark, distributed data processing, and modern AI/LLM-enabled data platforms. This role is specifically targeted toward engineers who have built production-grade scalable data and AI platforms - not generic ETL developers.
The ideal candidate must demonstrate strong hands-on coding capability, advanced Spark optimization expertise, enterprise-grade CI/CD engineering experience, and practical exposure to LLM evaluation, fine-tuning workflows, vector-space architectures, and ML model evaluation pipelines. Retail domain experience is highly preferred.
Key Responsibilities
- Design and build scalable data platforms using Databricks, Spark, Delta Lake, and PySpark.
- Develop and optimize large-scale data pipelines and distributed processing applications.
- Perform Spark performance tuning, query optimization, and troubleshooting.
- Build and manage CI/CD pipelines using GitLab and automation tools.
- Work on REST APIs, Databricks automation, and deployment workflows.
- Support AI/ML workflows including LLMOps, RAG pipelines, vector search, and model evaluation.
- Implement data governance, security, metadata management, and data quality standards.
- Conduct code reviews, mentor junior engineers, and drive engineering best practices.
- Collaborate with DevOps, ML, and Business teams to deliver scalable AI and analytics platforms.
Mandatory Requirements
- 5+ years of hands-on experience in Data Engineering with strong enterprise delivery experience.
- Proven experience as a Senior Data Engineer / Lead Data Engineer in large-scale environments.
- Expert-level programming experience in Python and PySpark.
- Strong expertise in Apache Spark internals and Spark performance optimization.
- Hands-on experience with:
- Databricks
- Delta Lake
- Spark SQL
- Advanced SQL optimization
- Extensive experience building and maintaining enterprise CI/CD pipelines.
- Strong expertise in GitLab pipelines and automated deployment workflows.
- Hands-on experience with Databricks REST APIs.
- Strong understanding of distributed data processing and scalable architecture design.
- Experience with LLM evaluation frameworks and ML model evaluation pipelines.
- Experience supporting or implementing LLM fine-tuning workflows.
- Experience with vector-space architectures, embeddings, semantic search, or vector databases.
- Strong experience in scalable data integration and distributed processing frameworks.
- Experience with data governance, metadata management, lineage, and data quality frameworks.
- Strong analytical, debugging, and problem-solving capabilities.
- Excellent stakeholder communication and technical leadership skills.
Highly Preferred
Candidates with the following experience will be strongly preferred:
- Experience working on Python IDE-based enterprise development environments.
- Experience in ML model evaluation and AI platform engineering.
- Experience supporting GenAI, RAG, or LLMOps initiatives.
- Experience with vector databases such as Pinecone, FAISS, ChromaDB, or similar platforms.
- Experience designing reusable framework-based engineering solutions rather than project-specific implementations.