We at Innovecture are hiring for a Data Engineer to expand our team.
About Innovecture:
Founded in 2007 under the leadership of CEO Shreyas Kamat, Innovecture LLC, began as a U.S.-based Information Technology and Management Consulting Company focusing on technology consulting and services. With international development centers located in Salt Lake City, USA, and Pune, India, Innovecture leverages its Global Agile Delivery Model to effectively deliver client projects within budget scope and project deadline. The primary focus of Innovecture is to provide a unique wealth of expertise and experience to the IT and Management Consulting realm by utilizing various technologies across multiple industry domains. Innovecture uses best-in-class design processes and top-quality talent to ensure the highest quality deliverables. With innovation embedded in its consulting and services approach, Innovecture will continue to deliver outstanding results for its Fortune 500 clients and employees.
Key Responsibilities:
Role and Responsibilities
We are looking for a highly skilled Data Engineer with deep expertise in Databricks and modern data engineering practices to join our Data COE.
Your Experience
As a Data Engineer, you will be responsible for developing Data solutions to address business problems. This role will be responsible for designing, building, and optimizing scalable data pipelines and data platforms that support advanced analytics and business intelligence initiatives. The ideal candidate is a problem-solver with hands-on expertise in Databricks, Spark, Delta Lake, cloud platforms (AWS), and data pipeline orchestration, with a strong focus on performance, reliability, and scalability. This is a hands-on role that requires the candidate to work collaboratively in a squad following a Scaled Agile development methodology. You must be a self-starter, delivery-focused, and possess a broad set of technology skills.
Things you will do:
- Ensure that solution requirements are gathered accurately, understood, and that all stakeholders have transparency on impacts.
- Design, develop, and maintain scalable ETL/ELT pipelines using Databricks (PySpark/Scala/Spark SQL)
- Build and optimize batch and streaming pipelines for large-scale data processing.
- Implement data ingestion frameworks for structured and unstructured data.
- Develop solutions leveraging Databricks Lakehouse architecture (Delta Lake)
- Implement data models, schema design, partitioning, and performance tuning.
- Ensure ACID compliance, data versioning, and time travel capabilities in Delta.
- Work with cloud platforms (preferably AWS – S3, Glue)
- Integrate data pipelines with data warehouses, APIs, and downstream applications.
- Implement CI/CD pipelines for data workflows.
- Enforce data quality, validation, and monitoring frameworks.
- Build reports using Power BI.
- Implement data lineage, cataloguing (Unity Catalog preferred), and governance standards.
- Ensure compliance with security, privacy, and regulatory requirements.
- Optimize Spark jobs for cost and performance.
- Monitor cluster utilization and improve efficiency in Databricks.
- Implement caching, partition pruning, and query optimization techniques.
- Troubleshoot and resolved development issues.
- Provide technical support to clients regarding existing problems.
- Recommend and execute code improvements based on current solutions.
What will you bring
- For the Data Engineer role, we are looking for a candidate with at least 5+ years of extensive experience in Data Engineering.
- Proven record of successfully delivering software with a broad mix of languages, technologies, and platforms.
- Proven experience in Big Data Engineering.
Technical Skills Required:
Must have skills.
- Strong hands-on experience with Databricks (must-have)
- Expertise in Apache Spark (PySpark/Scala)
- Strong proficiency in SQL and data modelling
- Strong hands-on experience in Power BI
- Experience with Delta Lake and Lakehouse architecture
- Hands-on experience with cloud platforms (AWS preferred)
- Experience with orchestration tools
- Knowledge of streaming frameworks (Kafka)
- Proficiency in Python (mandatory)
- Familiarity with Scala
- Experience with Git, CI/CD pipelines
- Exposure to data cataloguing tools (Unity Catalog)
- Proficient understanding of distributed computing principles
- Good exposure to AWS Big data platform and services such as S3, Glue, EMR, DMS, Athena, RDS, Redshift. Kinesis etc.
- Management of EMR cluster, with all included services
- Strong working knowledge with RDBMS such as PostgreSQL, Oracle, SQL Server.
- Experience with writing complex SQL queries in PostgreSQL
- Experience working with a source control system such as Azure DevOps, GitHub
Nice to have skills.
- Data warehousing concepts
- Experience supporting ML pipelines in Databricks
- Familiarity with MLflow (experiment tracking, model registry)
- Understanding of Generative AI concepts (LLMs, embeddings, RAG)
- Experience in building Workflow orchestrators using Jenkins
.
Soft skills.
- Strong analytical and problem-solving abilities
- Excellent communication skills
- Ability to work in a fast-paced, agile environment
- Ownership mindset and attention to detail