Data Engineer

SecHorizon Technologies Pvt. Ltd
Remote

Quick apply

Job details

Contractual / Temporary | Freelance
₹15,00,000 - ₹16,00,000 a year
7 days ago

Benefits

Paid time off
Work from home
Paid sick time

Qualifications

CI/CD
Engineering
Git
AWS
Bachelor's degree
Terraform
Continuous integration
GitHub
Python
Identity & access management

Full job description

Key responsibilities

· Analytics exports. Build and maintain denormalised CSV and Parquet exports that the VS team can query or download without needing to understand our schema. Every export includes a manifest file and field dictionary.

· Feature engineering pipeline. Implement the transforms the VS team needs: unit normalisation to SI, date standardisation, null handling strategy (impute vs. flag vs. exclude), and outlier detection flags.

· VS dataset export format. Design the versioned, self-describing VS export format. Field definitions, provenance metadata, and a changelog so the VS team always knows what changed between exports.

· Data quality dashboard. Build the monitoring layer: per-document extraction scores, field-level fill rates, flagged item counts, and trend over time. This is how the team knows if an ML change is an improvement.

· Schema evolution. Work with the Pipeline engineer on database schema migrations. You own the analytics-facing views - the denormalised representations that make the data queryable without joins.

· Team coordination. Attend the weekly sync. Map the model's required fields against what the database currently produces. Document the gaps and drive closure.

key deliverables

CI/CD pipeline- linting, tests, ML evaluation harness as a merge gate
Analytics dataset exports - CSV and Parquet
Feature engineering pipeline - unit normalisation, date standardisation, null strategy, outlier flags
VS dataset export format v1 - versioned, with manifest and field dictionary
Data quality dashboard - per-doc scores, fill rates, flagged item trends
VS field mapping and gap analysis document
Ongoing data quality monitoring

Technical skills and experience

Minimum 4+ years data engineering in production environments
Bachelor’s degree in engineering or science
CI/CD pipeline ownership: GitHub Actions, CircleCI, or equivalent. You have built pipelines from scratch, not just maintained them
Infrastructure as code: Terraform or Pulumi. You write it, not just read it
ML infrastructure experience: you have supported a team running models in production - training pipelines, experiment tracking, model versioning, deployment
Cloud platform depth on AWS or GCP - networking, IAM, secrets, storage, compute cost management
Strong enough on Python to read, debug, and instrument the ML team's training code without their help

Nice to have

Experience with ML experiment tracking tools (MLflow, Weights & Biases, Neptune)
Experience feeding data into ML training pipelines
Familiarity with scientific units and measurement conventions (helpful for normalisation work)
Experience with dbt, Great Expectations, or similar data quality tooling
Background in engineering or industrial data (sensor data, test reports, datasheets)

Pay: ₹1,500,000.00 - ₹1,600,000.00 per year

Benefits:

Paid sick time
Paid time off
Work from home

Application Question(s):

What is your current CTC ?
What is your Expected CTC ?
What is your Notice Period ?

Education:

Bachelor's (Preferred)

Experience:

data engineer: 4 years (Preferred)

Work Location: Remote

Quick apply

Jobseeker tools

Employer Tools

Browse

Stay Connected