Project Role : Data Engineer
Project Role Description : Design, develop and maintain data solutions for data generation, collection, and processing. Create data pipelines, ensure data quality, and implement ETL (extract, transform and load) processes to migrate and deploy data across systems.
Must have skills : Python Frameworks, Databricks Unified Data Analytics Platform, Generative AI
Good to have skills : NA
Minimum
7.5 year(s) of experience is required
Educational Qualification : 15 years full time education
Summary:
We are looking for a strong Backend Engineer (L8) with expertise in FastAPI and microservices-based architecture, complemented by data engineering capabilities on Databricks and exposure to GenAI application development.
The role will focus on building API-driven backend services for both traditional data platforms and GenAI-powered applications, enabling scalable data access, orchestration, and AI-driven solutions.
1. Backend Engineering (Primary – FastAPI)
Design and develop scalable, high-performance FastAPI-based microservices
Build RESTful APIs for:
Data access services
Metadata exposure
Workflow orchestration
Implement asynchronous APIs for long-running and high-concurrency workloads
Ensure API reliability, security, observability (logging, monitoring, tracing)
Develop reusable frameworks, middleware, and service templates
2. GenAI Application Development (New – Critical Skill)
Build backend APIs using FastAPI to support GenAI use cases, including:
Prompt orchestration and request routing
Integration with LLM services (Azure OpenAI / OpenAI / similar platforms)
Develop APIs that: Handle context-aware queries (RAG patterns)
Interface with vector stores / embeddings (via Databricks or external systems where applicable)
Implement multi-step workflows for GenAI applications (e.g., chaining, reasoning pipelines)
Ensure secure, scalable, and performant serving of GenAI models via APIs
Incorporate handling of: Latency optimization
Request throttling / token usage considerations (at API design level)
3. Data Engineering (Secondary – Databricks)
Develop and optimize data pipelines using Databricks / PySpark
Work with:
Delta Lake / Lakehouse architecture
Batch and near real-time pipelines
Support data ingestion, transformation, and provisioning layers
Optimize performance (partitioning, caching, tuning Spark jobs)
Enable integration between data pipelines and GenAI APIs (for RAG-based solutions)
Professional & Technical Skills:
- Must To Have Skills: Proficiency in Python Frameworks, Databricks Unified Data Analytics Platform, Generative AI.
- Strong experience in building scalable and reliable data pipelines using Python-based frameworks.
- In-depth knowledge of data processing techniques and best practices for data quality assurance.
- Familiarity with cloud-based data platforms and distributed computing environments.
- Ability to troubleshoot and optimize ETL workflows to improve performance and reliability.
- Experience working with large datasets and implementing data integration solutions.
Additional Information:
- The candidate should have minimum 7.5 years of experience in Python Frameworks.
- This position is based at our Pune office.
- A 15 years full time education is required.