Job Summary We are seeking a skilled Python Developer with strong expertise in Data Modelling to join our growing data engineering team. The ideal candidate will have hands-on experience in Python development, large-scale data processing, and designing scalable data models to support analytics and business intelligence initiatives. The candidate will work with complex datasets, real-time pipelines, and cloud platforms to drive data-driven decision-making.
Key Responsibilities
- Design and implement scalable data models including Dimensional Modelling, Data Vault 2.0, Star/Snowflake Schema, and SCD handling
- Develop and maintain Python-based data pipelines and ETL/ELT workflows for large-scale data processing
- Build and optimize PySpark jobs for distributed data processing and performance tuning
- Perform data extraction, transformation, cleaning, and validation across structured and unstructured datasets
- Analyze large datasets to identify trends, patterns, and actionable business insights
- Collaborate with Product, BI, and DevOps stakeholders to convert business requirements into scalable data solutions
- Optimize SQL queries and improve database performance across relational and cloud data warehouse platforms
- Ensure data quality, consistency, accuracy, and reliability across production pipelines
- Develop and maintain orchestration workflows using Apache Airflow
- Document technical processes, data models, workflows, and architecture decisions
Required Skills & Qualifications
- Strong proficiency in Python and SQL programming
- Hands-on experience with PySpark, Pandas, NumPy, and data processing libraries
- Strong expertise in Data Modelling — ER Modelling, Dimensional Modelling, Data Vault 2.0, Star/Snowflake Schema, SCD handling
- Experience designing and managing ETL/ELT pipelines and data warehousing solutions
- Good knowledge of relational databases and cloud data warehouses (Redshift, Snowflake, Athena)
- Experience with data cleaning, transformation, validation, and data quality frameworks
- Familiarity with Apache Airflow for pipeline orchestration
- Understanding of APIs, event-driven architecture, and data integration processes
- Knowledge of Shell Scripting and CI/CD practices
Preferred Qualifications
- Hands-on experience with AWS (EMR, S3, Glue, Athena, Redshift, Lambda, CloudWatch)
- Exposure to Apache Spark Streaming, Kafka, or RabbitMQ for real-time data processing
- Experience with Data Lake Architecture and Apache Iceberg
- Knowledge of Kubernetes, Jenkins, and DevOps integration
- Exposure to Generative AI or Agentic AI concepts is a plus
- Bachelor's degree in Computer Science, Data Science, Information Technology, or related field
Key Requirements
- Immediate joiners preferred
- Strong analytical and problem-solving skills
- Excellent communication and cross-functional collaboration abilities
- Ability to work independently in a fast-paced, production-driven environment
- Strong attention to detail and commitment to data accuracy and reliability
Pay: ₹55,000.00 - ₹65,000.00 per month
Benefits:
- Health insurance
- Paid sick time
- Provident Fund
Application Question(s):
- What is your current CTC in your organization?
- What is your expected CTC?
- What is your Notice period?
Experience:
- Data Modelling: 2 years (Required)
- Python : 3 years (Required)
Work Location: In person