Apache Spark Technical Specialist - Scala, Python

HCLTech -
Amangal, Telangana

Apply Now

Job details

6 days ago

Qualifications

CI/CD
Data modelling
Spark
SQL
Machine learning
Scala
Continuous integration
AI
Python

Full job description

Amangal, Telangana
Job Summary

Role Summary
The Data Engineer is responsible for designing, building, and operating high-quality,
scalable, and reusable data services that support analytics, AI, and GenAI use cases
across business domains.
In this role, you will design and work hands-on with data pipelines, data models,
orchestration frameworks, storage layers, and observability tooling.
You will collaborate closely with AI Engineers, Data Scientists, Product Owners, and
Platform teams to deliver reliable, well-governed, and self-service data products.

Key Responsibilities

Key Responsibilities
Data Platform & Services Engineering

Build and maintain scalable data pipelines and ingestion frameworks for batch,

streaming, and event-driven data.

Develop and maintain modular data models and semantic layers optimized for

analytics, BI self-service and AI use cases.

Implement and operate orchestration workflows (e.g., Databricks Workflows)

and compute engines (Spark, SQL, Python).

Work with storage technologies such as Delta Lake, ADLS, feature and vector

stores.
Data Quality, Governance & Observability

Implement data quality checks, validations, and monitoring to ensure reliability

and trust in data products.

Contribute to data lineage, metadata management, and documentation.
Apply observability practices using tools such as Great Expectations or Monte

Carlo.

Ensure compliance with data governance standards and regulations (e.g., GDPR)

in collaboration with data governance teams.
Enablement for AI & Analytics Use Cases

Deliver curated datasets and reusable data assets for analytics, machine

learning, and GenAI applications.

Build pipelines that process structured, graph, and unstructured data (e.g., text,

documents, images).

Support AI Engineering teams with data preparation for embeddings, vector

stores, and retrieval-augmented generation (RAG) pipelines.
Tooling & Self-Service

Contribute to data engineering tooling and frameworks that enable eSicient

development and deployment of pipelines.

Develop data pipelines using tools such as dbt and Databricks Lakeflow.
Support reuse of data services through clear documentation, data contracts,

templates, and examples.
Collaboration & Ways of Working

Collaborate with Data Scientists, AI Engineers, Product Owners, Business SMEs,

and Platform teams.

Participate in technical design discussions, code reviews, and architecture

forums.

Follow engineering best practices for version control, testing, CI/CD, and

operational excellence.

Skill Requirements

Preferred Qualifications

5+ years of experience in data engineering and building production-grade data

pipelines.

Strong hands-on experience with data platforms such as Databricks.
Solid knowledge of data modeling, SQL, Spark, and Python.
Experience with orchestration frameworks, data quality tooling, and

observability practices.

Exposure to unstructured data processing and AI/GenAI data pipelines is a

strong plus.

Experience working in a global, multi-team environment is beneficial.

Success in This Role Means

Reliable, well-documented data products are available for analytics and AI use

cases.

Data pipelines are scalable, cost-eSicient, observable, and easy to operate.
Data engineers and AI teams can move faster using reusable patterns and selfservice

data services.

Structured and unstructured data are eSectively integrated to support advanced

analytics and GenAI innovation.

#body.unify div.unify-button-container .unify-apply-now: focus, #body.unify div.unify-button-container .unify-apply-#body.unify div.unify-button-container .unify-apply-now: focus, #body.unify div.unify-button-container .unify-apply-

Apply Now

Jobseeker tools

Employer Tools

Browse

Stay Connected