Hands-on experience in document intelligence and extraction
Production experience with Azure Document Intelligence — not just familiarity, but someone who has used it at scale and knows its limitations
Real table extraction experience from complex documents — merged cells, borderless tables, multi-column layouts — using tools like Camelot, tabula, or equivalent, in a live system, not a prototype
Custom ML model training for document parsing — layout classifiers, field extractors, or document understanding transformers (Donut, LayoutLM, or equivalent) taken through to production
A working evaluation methodology — ground truth datasets, per-field precision and recall, systematic quality measurement. Candidates who measure extraction quality only by manual spot-checking are not suitable.
Experience with engineering, industrial, or technical documents — test reports, datasheets, inspection records — rather than purely financial or healthcare documents
Strong Python engineering — not notebook-level, but production pipeline code with structured logging, error handling, and job queues
The documents we are processing are complex — multi-thousand-page engineering reports, inconsistent layouts, handwritten annotations, multi-column tables. We need someone who has faced genuinely hard extraction problems and solved them, not someone who has let cloud tools handle the easy cases.
Pay: ₹1,500,000.00 - ₹1,600,000.00 per year
Benefits:
- Paid sick time
- Paid time off
- Work from home
Application Question(s):
- What is your current CTC ?
- What is your Expected CTC ?
- What is your Notice period time ?
Experience:
- OCR: 4 years (Required)
- Python: 4 years (Required)
- Azure: 3 years (Required)
- document processing: 4 years (Required)
Work Location: Remote