Role Principal / Lead Data Engineer
Mandatory Skills PySpark, Advanced SQL, Azure Databricks Certification is a MUST
Locations: Gurugram, Haryana/Chennai, Tamil Nadu
Joiner Status Immediate Joiner (or candidates with a notice period of 15 days or less)
Job Type Full-Time
Role Overview: We are seeking a highly experienced and motivated Principal / Lead Data Engineer to join our dynamic data platform team in either our Gurgaon or Chennai office. The successful candidate will be a critical player in designing, building, and optimizing our next-generation data architecture and pipelines.This role requires expert-level proficiency in PySpark and SQL, alongside a proven track record of architecting scalable, high-performance ETL/ELT processes. You will transform vast amounts of raw data into high-quality, actionable insights for analytics, reporting, and Machine Learning. Given the seniority of this role, we are looking for a seasoned leader and immediate joiner who can hit the ground running and contribute significantly from day one.
Key Responsibilities
Data Pipeline Development & Optimization
- Design and Build: Architect, develop, and maintain robust, scalable, and fault-tolerant ETL/ELT pipelines for ingesting data from diverse sources (e.g., databases, APIs, streaming sources) into our data lake and data warehouse.
- PySpark Expertise: Write and optimize complex data transformation jobs using PySpark and the Spark DataFrame API to process petabytes of structured and unstructured data efficiently.
- SQL Mastery: Utilize Advanced SQL for complex querying, data manipulation, stored procedures, performance tuning, and optimizing database schema design in relational and analytical databases.
- Data Quality & Governance: Implement data validation, cleansing, and monitoring routines to ensure high data quality, integrity, and adherence to security and governance standards.
Architecture and Infrastructure
- Data Modeling: Design and implement optimal data models (e.g., Dimensional Modeling, Data Vault, Snowflake Schema) for our data warehouse to support business intelligence and analytical needs.
- Cloud Integration: Drive cloud-native data solutions primarily leveraging Azure Databricks, Azure Data Lake, and Synapse (or comparable frameworks like AWS S3/Redshift and Google BigQuery) to build and deploy data solutions.
- Automation: Implement orchestration tools like Apache Airflow, Azure Data Factory, or AWS Step Functions to automate data workflows and manage pipeline dependencies.
Collaboration and Operational Excellence
- Cross-Functional Leadership: Collaborate closely with Data Scientists, Data Analysts, Product Managers, and Business Stakeholders to understand data requirements and translate them into high-level technical specifications.
- Monitoring & Support: Monitor, troubleshoot, and resolve critical issues in production data pipelines, ensuring maximum uptime and timely data delivery.
- Best Practices & Mentorship: Lead code reviews, enforce strict coding standards, mentor junior engineers, and contribute to the continuous improvement of development and deployment practices (CI/CD, Git).
Required Technical Skills (Mandatory)
- Certification: Active Azure Databricks Certification (e.g., Databricks Certified Data Engineer Associate/Professional).
- PySpark: Expert-level, hands-on experience in developing, tuning, and optimizing large-scale data processing applications using PySpark (Python for Apache Spark).
- SQL: Mastery of Advanced SQL (including window functions, complex joins, stored procedures, and query performance tuning) across various database systems (e.g., Snowflake, Redshift, PostgreSQL).
- Programming: Strong proficiency in Python for scripting, automation, and general data manipulation libraries (e.g., Pandas).
- Big Data Architecture: Deep understanding of Big Data concepts, distributed systems architecture, data lakes, and modern data warehousing principles.
- ETL/ELT: Proven experience in designing and implementing enterprise-grade ETL/ELT pipelines.
Preferred Qualifications (Good to Have)
- Hands-on experience with wider Azure ecosystem components (Azure Data Factory, Azure Synapse, Key Vault).
- Familiarity with workflow orchestration tools like Apache Airflow.
- Experience with real-time/streaming data processing (e.g., Spark Structured Streaming, Kafka, or Event Hubs).
- Advanced knowledge of Data Governance, Data Cataloging, and Data Security best practices.
Candidate Profile
- Educational Background: Bachelor’s or Master’s degree in Computer Science, Engineering, or a related quantitative field.
- Mindset: Proactive, self-motivated, strategic thinker with a strong sense of ownership and urgency.
- Communication: Excellent verbal and written communication skills to articulate complex technical concepts to non-technical stakeholders and executive leadership.
Pay: ₹1,500,000.00 - ₹1,800,000.00 per year
Work Location: In person