Data Engineer

Innovecture -
Pune, Maharashtra

Quick apply

Job details

Qualifications

CI/CD
RDBMS
Power BI
Data modelling
Azure
Oracle
Business intelligence
Big data
DevOps
Spark
Microsoft SQL Server
Git
SQL
AWS
Analysis skills
PostgreSQL
Distributed systems
Scala
Continuous integration
Unity
GitHub
APIs
Agile
S3
Kafka
Redshift
AI
RDS database
Jenkins
Communication skills
Data warehouse
Python

Full job description

We at Innovecture are hiring for a Data Engineer to expand our team.

About Innovecture:
Founded in 2007 under the leadership of CEO Shreyas Kamat, Innovecture LLC, began as a U.S.-based Information Technology and Management Consulting Company focusing on technology consulting and services. With international development centers located in Salt Lake City, USA, and Pune, India, Innovecture leverages its Global Agile Delivery Model to effectively deliver client projects within budget scope and project deadline. The primary focus of Innovecture is to provide a unique wealth of expertise and experience to the IT and Management Consulting realm by utilizing various technologies across multiple industry domains. Innovecture uses best-in-class design processes and top-quality talent to ensure the highest quality deliverables. With innovation embedded in its consulting and services approach, Innovecture will continue to deliver outstanding results for its Fortune 500 clients and employees.

Key Responsibilities:

Role and Responsibilities
We are looking for a highly skilled Data Engineer with deep expertise in Databricks and modern data engineering practices to join our Data COE.

Your Experience
As a Data Engineer, you will be responsible for developing Data solutions to address business problems. This role will be responsible for designing, building, and optimizing scalable data pipelines and data platforms that support advanced analytics and business intelligence initiatives. The ideal candidate is a problem-solver with hands-on expertise in Databricks, Spark, Delta Lake, cloud platforms (AWS), and data pipeline orchestration, with a strong focus on performance, reliability, and scalability. This is a hands-on role that requires the candidate to work collaboratively in a squad following a Scaled Agile development methodology. You must be a self-starter, delivery-focused, and possess a broad set of technology skills.

Things you will do:

Ensure that solution requirements are gathered accurately, understood, and that all stakeholders have transparency on impacts.
Design, develop, and maintain scalable ETL/ELT pipelines using Databricks (PySpark/Scala/Spark SQL)
Build and optimize batch and streaming pipelines for large-scale data processing.
Implement data ingestion frameworks for structured and unstructured data.
Develop solutions leveraging Databricks Lakehouse architecture (Delta Lake)
Implement data models, schema design, partitioning, and performance tuning.
Ensure ACID compliance, data versioning, and time travel capabilities in Delta.
Work with cloud platforms (preferably AWS – S3, Glue)
Integrate data pipelines with data warehouses, APIs, and downstream applications.
Implement CI/CD pipelines for data workflows.
Enforce data quality, validation, and monitoring frameworks.
Build reports using Power BI.
Implement data lineage, cataloguing (Unity Catalog preferred), and governance standards.
Ensure compliance with security, privacy, and regulatory requirements.
Optimize Spark jobs for cost and performance.
Monitor cluster utilization and improve efficiency in Databricks.
Implement caching, partition pruning, and query optimization techniques.
Troubleshoot and resolved development issues.
Provide technical support to clients regarding existing problems.
Recommend and execute code improvements based on current solutions.

What will you bring

For the Data Engineer role, we are looking for a candidate with at least 5+ years of extensive experience in Data Engineering.
Proven record of successfully delivering software with a broad mix of languages, technologies, and platforms.
Proven experience in Big Data Engineering.

Technical Skills Required:
Must have skills.

Strong hands-on experience with Databricks (must-have)
Expertise in Apache Spark (PySpark/Scala)
Strong proficiency in SQL and data modelling
Strong hands-on experience in Power BI
Experience with Delta Lake and Lakehouse architecture
Hands-on experience with cloud platforms (AWS preferred)
Experience with orchestration tools
Knowledge of streaming frameworks (Kafka)
Proficiency in Python (mandatory)
Familiarity with Scala
Experience with Git, CI/CD pipelines
Exposure to data cataloguing tools (Unity Catalog)
Proficient understanding of distributed computing principles
Good exposure to AWS Big data platform and services such as S3, Glue, EMR, DMS, Athena, RDS, Redshift. Kinesis etc.
Management of EMR cluster, with all included services
Strong working knowledge with RDBMS such as PostgreSQL, Oracle, SQL Server.
Experience with writing complex SQL queries in PostgreSQL
Experience working with a source control system such as Azure DevOps, GitHub

Nice to have skills.

Data warehousing concepts
Experience supporting ML pipelines in Databricks
Familiarity with MLflow (experiment tracking, model registry)
Understanding of Generative AI concepts (LLMs, embeddings, RAG)
Experience in building Workﬂow orchestrators using Jenkins

Soft skills.

Strong analytical and problem-solving abilities
Excellent communication skills
Ability to work in a fast-paced, agile environment
Ownership mindset and attention to detail

Quick apply

Jobseeker tools

Employer Tools

Browse

Stay Connected