Principal / Lead Data Engineer
Location: Gurgaon / Chennai, India
Job Type: Full-Time
Experience Required: 12+ Years
Joining Preference: Immediate Joiners or candidates with a notice period of 15 days or less
Role Overview
We are looking for an experienced and highly skilled Principal / Lead Data Engineer to join our growing Data Engineering team. In this leadership role, you will be responsible for designing, developing, and optimizing scalable data platforms and enterprise-grade ETL/ELT pipelines that support analytics, reporting, and Machine Learning initiatives.
The ideal candidate should possess deep expertise in PySpark, Advanced SQL, and modern cloud-based data engineering practices, along with strong experience in distributed systems and large-scale data processing. An active Azure Databricks Certification is mandatory for this role.
This position requires a proactive technical leader who can collaborate with cross-functional teams, mentor engineers, and drive best practices across the data ecosystem.
Key ResponsibilitiesData Engineering & Pipeline Development
- Design, develop, and maintain scalable, reliable, and high-performance ETL/ELT pipelines.
- Build robust data ingestion frameworks for structured and unstructured data from multiple sources including databases, APIs, and streaming platforms.
- Develop and optimize large-scale data transformation workflows using PySpark and Spark DataFrame APIs.
- Implement data quality checks, validation frameworks, and monitoring solutions to ensure data integrity and reliability.
- Optimize data processing performance for large datasets and distributed environments.
SQL & Data Modeling
- Utilize Advanced SQL for complex transformations, query optimization, performance tuning, and analytical workloads.
- Design scalable data models including Dimensional Modeling, Snowflake Schema, and Data Vault architectures.
- Support enterprise reporting and analytics requirements through efficient schema and query design.
Cloud & Big Data Architecture
- Build cloud-native data solutions using Azure Databricks, Azure Data Lake, and related Azure services.
- Drive architecture decisions for scalable data lake and modern data warehouse implementations.
- Implement orchestration and workflow automation using tools such as Apache Airflow, Azure Data Factory, or similar platforms.
- Contribute to CI/CD practices, automation, and deployment optimization.
Leadership & Collaboration
- Collaborate with Data Scientists, Analysts, Product Teams, and Business Stakeholders to understand and translate business requirements into technical solutions.
- Lead technical discussions, code reviews, and architecture decisions.
- Mentor junior and mid-level engineers and promote engineering best practices.
- Troubleshoot and resolve production issues while ensuring high availability and operational excellence.
Mandatory Skills & Qualifications
- 12+ years of overall IT experience in Data Engineering and Big Data technologies.
- Active Azure Databricks Certification (Associate or Professional level) is mandatory.
- Strong hands-on expertise in PySpark for large-scale distributed data processing.
- Advanced proficiency in SQL, including:
- Complex joins
- Window functions
- Query optimization
- Performance tuning
- Stored procedures
- Strong programming skills in Python.
- Deep understanding of:
- Big Data architecture
- Distributed systems
- Data lakes
- Modern data warehousing concepts
- Proven experience in designing and implementing enterprise-scale ETL/ELT pipelines.
- Hands-on experience with Azure cloud data services.
Urgent Requirement
Pay: ₹1,500,000.00 - ₹2,000,000.00 per year
Work Location: In person