SENIOR DATA ENGINEER
Enterprise Systems | Data & Intelligence Platform
ABOUT THE ROLE
You design and build data platform end-to-end. Given a business problem, you own the
solution architecture, build the pipelines, model the data, and deliver the serving layer —
independently, to production quality. You are equally comfortable at a whiteboard and in a
notebook. You don’t wait for a detailed brief; you write one.
This is a hands-on engineering role working within a small data team. You will own the data platform from ingestion through to the intelligence layer — the structured, governed, AI-ready data products that power reporting, self-serve analytics, and operational decision-making.
You operate across a hybrid cloud environment: migrating existing MSSQL Server data warehouse to Microsoft Fabric, integrating new data sources from REST API, MySQL, PostgreSQL (AWS), ensuing data has been setup for semantic models build and intelligence outputs in Power BI. You are a technical authority on data solution design in this space.
CORE RESPONSIBILITIES
Solution Design & Architecture
- Translate ambiguous business problems into clearly scoped data engineering solutions — source mappings, transformation logic, medallion layer design, serving layer specification, and data lineage documentation.
- Produce solution design artefacts independently: architecture diagrams, data flow documentation, modelling approach, feasibility and security assessment. No requirement for a detailed brief.
- Assess build-vs-buy and platform capability trade-offs across the Microsoft Fabric, Azure, and AWS stack.
- Challenge requirements when the data approach is wrong.
-
Propose alternatives before committing to implementation.
- Define and enforce data engineering standards: naming conventions, medallion layer contracts, pipeline testing requirements, and documentation expectations across the platform.
- Assess cost of proposed data solutions – commercial vs capacity usage limits
Pipeline Engineering & Ingestion
- Design and build ingestion pipelines within Microsoft Fabric, utilising data pipelines, spark notebooks and RTI to orchestrate data into our enterprise datalake.
Build and maintain medallion architecture pipelines (bronze silver gold- platinum) including Delta table management, schema evolution, spark & delta table optimisation techniques, and various data ingest patterns.
- Implement CDC and batch ingestion strategies against various sources using AWS DMS, Debezium, Fabric-native mirroring + CDF, high water marking strategies - depending on source system characteristics and latency requirements.
- Build and provide support within additional upstream / downstream systems including - SQL Server, MySQL, AWS RDS, and PostgreSQL on AWS
- Implement robust orchestration: dependency chaining, failure handling, retry logic, alerting on pipeline degradation, and watermark-based incremental loads & logging frameworks
- Adopt and extend Fabric Pipelines (FDF), Dataflows Gen2, Notebooks, and Real-Time Intelligence for streaming source integration.
- Enable multi-cloud data delivery: S3 staging, cross-account AWS–Azure connectivity, and OneLake shortcut/mirroring strategies.
- Design gold/platinum layer data for modelling: fact table grain definition, conformed dimensions, SCD Type 1/2 implementation in Delta Lake, bridge tables for many-to-many relationships.
- Apply modelling patterns appropriate to the company domains: transactional, periodic snapshot, and accumulating snapshot fact tables across ordering, store operations, and loyalty data.
- Consider models that support both analytical consumption (Power BI Direct Lake) and operational data product use cases.
- Document modelling decisions — not just what was built but why, with explicit trade-offs recorded.
Intelligence Layer & Semantic Models
- Design and own enterprise Power BI semantic models as governed, reusable data products — not per-report datasets.
- Deliver DAX at depth: calculation groups, complex measure patterns, context transition, role-playing dimensions, and row-level security.
- Configure Direct Lake datasets with awareness of its constraints: column limits, aggregation behaviour, composite model restrictions, and refresh mechanics.
- Structure the gold serving layer for AI consumption: schema-stable, well-documented Delta tables suitable for Fabric Copilot grounding, RAG pipelines, and natural language query interfaces.
- Design the serving layer for analyst self-service — field naming standards, measure documentation, and logical model structure that allows analysts to answer business questions without engineering tickets.
- Maintain XMLA endpoint tooling (Tabular Editor, ALM Toolkit) for semantic model version control and deployment.
Data Quality & Governance
- Define and implement data quality controls across the pipeline: profiling on ingest, null/cardinality/distribution checks, and alerting before downstream consumption is affected.
- Apply governance frameworks: data lineage documentation, sensitive data classification, PII handling controls, and audit traceability across the lakehouse.
- Implement testing approaches (dbt tests, Great Expectations, or Fabric-native DQ rules) to validate accuracy, completeness, and pipeline behaviour against expected outcomes.
- Ensure all solutions align with enterprise security models, access control patterns, and change management workflows.
Platform Support & Reliability
- Proactively monitor pipeline health and data model performance — identify and resolve issues before they surface as business incidents.
- Integrate into incident and change management workflows; maintain runbooks for critical data pipelines.
Ensure notebooks, semantic models, and pipelines remain secure, traceable, versioned, and performant as the platform scales.
-
ESSENTIAL REQUIREMENTS
- 6+ years hands-on data engineering in production environments — pipelines, modelling, and serving layer.
- Proven ability to design and deliver end-to-end data solutions from a business problem statement, without requiring a detailed technical brief.
- Track record of owning data platform components in hybrid cloud environments.
Microsoft Fabric
Platform
- Operational experience (not just familiarity) with: Lakehouse, Dataflows Gen2, Fabric Pipelines, Direct Lake datasets, OneLake shortcuts and mirroring, Real-Time Intelligence.
- Understanding of Direct Lake mode constraints: column limits, aggregation behaviour, composite model restrictions.
- Delta table lifecycle management: schema evolution, time travel, optimisation techniques (PARTION, ZORDER, VACUUM)
- Up to date with the feature developments and monthly releases
- T-SQL proficiency: window functions, CTEs, execution plan analysis.
- Dimensional modelling: star/snowflake schema design, SCD Type 1/2/4 implementation.
DW migration patterns: SQL Server- cloud lakehouse schema mapping, type coercion, constraint translation.
- CDC and batch extraction from MySQL and PostgreSQL on AWS RDS into Azure-based platforms.
- AWS DMS or Debezium for ongoing replication pipelines.
- Cross-cloud connectivity: IAM, private endpoints, S3 staging patterns for Azure ingest.
- Understanding of semantic model engineering: DAX, calculation groups, RLS, incremental refresh configuration.
- Knowledge of XMLA endpoint tooling: Tabular Editor, ALM Toolkit for model deployment and version control.
- Serving layer design for Direct Lake: schema standards that support reliable Power BI consumption.
- Medallion architecture design with clear layer contracts.
- Fact table grain definition and SCD handling in Delta Lake.
- Conformed dimension design across multi-source domains.
- Owns a problem statement to production without requiring hand-holding at each step.
- Challenges requirements when the data approach is wrong — proposes and defends alternatives.
- Chooses reliable, maintainable solutions over clever ones. Ships working data products.
Comfortable operating with autonomy in a cross-functional environment.
-
PREFERRED — STRONG ADVANTAGE
- dbt or equivalent transformation framework with version-controlled, tested transformation logic
- Great Expectations, dbt tests, or Fabric-native DQ rules for automated data quality
- Fabric Data Agents / Ops Agents /Copilot or LLM grounding patterns — structuring serving layer data for AI-assisted analytics
- Event streaming experience: Kafka, Azure Event Hubs, or Fabric Real-Time Intelligence for low-latency pipelines
- Experience in quick-service restaurant, retail, or high-frequency transactional domains
AI-augmented development tooling: Copilot in Fabric, GitHub Copilot, or equivalent in day-to-day engineering workflow
-
WHAT YOU’LL BE BUILDING
You will join a team during an exciting period of transformation as we re-platform from an established SQL Server data warehouse to a new Microsoft Fabric lakehouse migration.
Your immediate focus will be:
- Completing and hardening the medallion architecture in Fabric — from bronze through to governed gold & platinum layer outputs
- Building the enterprise data foundations for our insight intelligence layer
- Designing a serving layer that supports both current Power BI consumption and future AI/Copilot integration
- Establishing data quality controls and pipeline reliability standards that the platform can scale on
- Owning the technical data standards — naming conventions, layer contracts, testing requirements — that all future platform work aligns to
About Softobiz Technologies
Softobiz Technologies is a technology and product services company headquartered in India, operating Global Capability Centers (GCCs) for leading international clients across healthcare, fintech, and enterprise software. Our GCC model enables world-class talent in India to work directly within the product and engineering teams of our global partners, contributing meaningfully to product strategy, growth, and operations
Innovation begins with like-minded people aiming to transform the world together. At Softobiz, we invite you to become a part of an organization that has been helping clients transform their business by fusing insights, creativity, and technology. With a team of 400+ technology enthusiasts, we have been trusted by leading enterprises around the globe for over 18+ years.
At Softobiz, we foster a culture of equality, learning, collaboration, and creative freedom, empowering our employees to grow and excel in their careers. Our technical craftsmen are pioneers in the latest technologies like AI, machine learning, and product development.
Why Should You Join Softobiz?
- Work with technical craftsmen who are pioneers in the latest technologies.
- Access training sessions and skill-enhancement courses for personal and professional growth.
- Be rewarded for exceptional performance and celebrate success through engaging parties.
- Experience a culture that embraces diversity and creates an inclusive environment for all employees.
Softobiz is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees. All qualified applicants will be afforded equal employment opportunities without discrimination based on race, creed, color, national origin, sex, age, disability, or marital status.