Job Description: We’re looking for a strong Python engineer to help modernize legacy data workflows into production-grade Python pipelines running in a cloud-based data environment.
The role focuses on building reliable, scalable, and highly validated data pipelines with structured configs, logging, parquet-based processing, and lightweight Streamlit interfaces. A major part of the work involves translating existing workflow logic into clean, vectorized Python while ensuring data accuracy and consistency across large datasets.
- Responsibilities: Strong Python + pandas expertise (vectorized, memory-aware, production-grade)
- Solid SQL fundamentals
- Data validation & edge-case handling (null joins, leading zeros, dtype drift, precision, encoding, sort stability)
- Performance optimization using profiling and benchmarking tools
Clean engineering practices: reusable code, documentation, testing, and backward compatibility
-
Tech stack includes :
- Python,
- pandas,
- PyArrow/parquet,
- SQL,
- Google APIs,
- Tableau integrations,
- Streamlit,
- openpyxl, and XML parsing
Experience with PySpark, Polars, DuckDB, or large-scale data processing is a plus.
You should be comfortable reading complex data flows, debugging silent data issues, optimizing slow transformations, and shipping maintainable pipeline code.