Work Schedule
Other
Environmental Conditions
Office
Job Description
Summarized Purpose:
We are offering an opportunity for a Mid-Level Data Engineer to design, build, test, tune, and support production data pipelines using PySpark, Python, advanced SQL, AWS data services, secure data handling practices, and AI-assisted data engineering capabilities.
Education/Experience:
-
Bachelor's degree or equivalent in Computer Science, Information Technology, Data Engineering, or related field
-
3-5 years of experience in data engineering, ETL development, SQL, AWS data platforms, or production data pipeline support
Major Job Responsibilities:
-
Develop, test, tune, and maintain ETL and data pipelines using PySpark, Python, SQL, and AWS services
-
Support ingestion and transformation of flat files, relational databases, APIs, data warehouses, and enterprise data sources
-
Collaborate with business analysts, data architects, QA, DevOps, and senior engineers to implement source-to-target mappings and data solutions
-
Implement CDC, incremental load design, idempotent pipeline processing, and data reconciliation patterns for reliable data movement
-
Maintain technical documentation, mapping specifications, data catalog updates, runbooks, automated tests, and release support materials
Knowledge, Skills, and Abilities:
-
Hands-on experience with PySpark, Python, advanced SQL, ETL best practices, data modeling, and large-scale data processing
-
Deep knowledge of Redshift performance tuning including distribution keys, sort keys, compression encoding, Spectrum, materialized views, WLM, vacuum, and analyze
-
Strong knowledge of Athena optimization including partition pruning, file formats, compression, schema evolution, and cost-efficient query design
-
Strong understanding of DynamoDB data modeling, access-pattern-based design, capacity planning, GSIs/LSIs, TTL, Streams, and performance tuning
-
Exposure to secure PHI/PII handling including encryption, access controls, auditability, retention, masking, and de-identification where applicable
-
Strong analytical, troubleshooting, documentation, communication, and cross-functional collaboration skills
Must Have Skills:
-
PySpark, Python, advanced SQL, ETL development, and data pipeline implementation experience
-
AWS data services experience including S3, Glue, Lambda, Step Functions, ECS, DynamoDB, Redshift, PostgreSQL, SQL Server, and Athena integration
-
Flat-file ingestion, source-to-target mapping, transformation logic, CDC, incremental loads, idempotent processing, reconciliation, and data quality checks
-
CI/CD, GitHub workflows, automated testing, and release management for data pipelines and database changes
-
Problem-solving, production support, debugging, documentation, and Agile delivery skills
Good to Have Skills:
-
Exposure to AI-assisted mapping automation and use of LLMs for data cleaning, data quality checks, transformation logic, or documentation
-
Familiarity with RAG patterns, embeddings, vector databases, semantic search, or AI-enabled data discovery solutions
-
Understanding of healthcare data standards such as HL7, FHIR, CCD, claims data, EMR extracts, clinical trial data, and patient de-identification
-
Familiarity with infrastructure as code such as Terraform or CloudFormation, plus Databricks, Snowflake, streaming, observability, or DevOps practices
Working Hours:
-
India: 05:30 PM to 02:30 AM IST
Philippines: 08:00 PM to 05:00 AM PHT