Key responsibilities
· Ingestion API. Build the file upload endpoint, format normaliser, and job dispatch layer. Every file gets a job ID; every step in that job is independently restartable.
· Orchestration pipeline. Wire the document processing steps — upload → OCR → layout classify → extract → score → write or flag — into a proper job queue with step-level retries, failure logging, and visibility into where a job is at any point.
· SQL schema. Design and own the Postgres schema: documents table, extraction results (with provenance columns - source file, page, bounding box), confidence scores, and review queue. Schema migrations must be backward-compatible and run without downtime.
· Multi-tenancy. Implement schema-level tenant isolation. Every API endpoint is tenant scoped. This is not retrofittable, it has to be right from the start.
· Structured logging & monitoring. Roll out a consistent structured log schema (JSON) across every pipeline step: job ID, document ID, step name, duration, outcome. Build alerting on queue depth, error rate, and latency thresholds.
· Performance. Profile the pipeline end to end and parallelise the embarrassingly parallel steps (OCR, layout). Target 2× throughput improvement. If OCR is the bottleneck, evaluate GPU-accelerated or cloud OCR as a fallback.
key deliverables
- Cloud infrastructure (IaC) - storage, database, job queue, secrets
- File ingestion API and job queue
- SQL schema v1 and all subsequent migrations
- Orchestration pipeline with step-level retries and failure logging
- Tenant isolation middleware
- Structured logging across all pipeline steps
- Pipeline monitoring and alerting
- End-to-end integration tests running in CI
Technical skills and experience
- Minimum of 4+ years backend engineering in Python-based production systems
- Bachelor’s degree in engineering or science
- 4+ years building and shipping ML models in production (not just notebooks)
- Deep experience with job queue systems - Celery, Temporal, or equivalent
- PostgreSQL - schema design, migrations, indexing, query optimisation
- Experience building REST APIs that are consumed by both a frontend and third-party integrations
- Infrastructure as code (Terraform or Pulumi)- you write it, not just read it
- Strong opinions about observability: structured logging, alerting, and making debugging tractable
Nice to have
- Experience building multi-tenant SaaS backends with schema-level isolation
- Familiarity with document processing pipelines or data ingestion at volume
- Experience with parallel processing patterns for CPU/IO-bound workloads
- Background in DevOps or platform engineering (you are also comfortable owning production)
Pay: ₹1,500,000.00 - ₹1,600,000.00 per year
Benefits:
- Paid sick time
- Paid time off
- Work from home
Application Question(s):
- What is your current CTC ?
- What is your Expected CTC ?
- What is your Notice Period ?
Experience:
- backend engineer : 4 years (Required)
Work Location: Remote