Senior Big Data Engineer (Python | AWS)

BEO software
Kochi, Kerala

Apply Now

Job details

Full-time

Qualifications

CI/CD
Performance tuning
Go
XML
Big data
Spark
Encryption
Test automation
Git
SQL
AWS
PostgreSQL
Terraform
Continuous integration
EMR systems
GitHub
APIs
S3
JSON
Data science
Redshift
Communication skills
Python
Identity & access management

Full job description

Location : Kochi

Employment Type : Full Time

Work Mode : Hybrid

Experience : 6-12 yrs

Job Code : BEO-5173

Posted Date : 11/02/2026

Job Description

Responsibilities

Role Summary
Own the data platform that powers clinician discovery, credential unification, and care-quality analytics. Design resilient, low-latency ingestion and transformation at scale (batch + streaming) with GDPR-by-design. Your work underpins search, matching, and ML features in our telemedicine platform across Germany.

Key Responsibilities

Design and operate AWS-native data lakehouse: Amazon S3 + Lake Formation (governance),

Glue/Athena (ELT), and optional Amazon Redshift for warehousing.

Build high-throughput ingestion and CDC pipelines from partner APIs, files, and databases

using EventBridge, SQS/SNS, Kinesis/MSK, AWS DMS, and Lambda/ECS Fargate.

Implement idempotent upserts, deduplication, and delta detection; define source-of-truth

governance and survivorship rules across authorities/insurers/partners.

Model healthcare provider data (DDD) and normalize structured/semi-structured payloads

(JSON/CSV/XML, FHIR/HL7 if present) into curated zones.

Engineer vector-aware datasets for clinician/patient matching; operate pgvector on Amazon.

Aurora PostgreSQL or use OpenSearch k-NN for hybrid search.

Establish data quality (freshness, accuracy, coverage, cost-per-item) with automated checks.

(e.g., Great Expectations/Deequ) and publish KPIs/dashboards.

Harden security & privacy: IAM least-privilege, KMS encryption, Secrets Manager, VPC

endpoints, audit logs, pseudonymised telemetry; enforce GDPR and right-to-erasure.

Observability-first pipelines using OpenTelemetry (ADOT), CloudWatch, X-Ray; DLQ

handling, replay tooling, resiliency/chaos tests; SLOs and runbooks.

Performance tuning for Aurora PostgreSQL (incl. indexing, partitioning, vacuum/analyze)

and cost-aware Spark (EMR/Glue) jobs.

CI/CD for data (Terraform/CDK, GitHub Actions/CodeBuild/CodePipeline); test automation

(pytest/DBT) and blue/green or canary for critical jobs.

Desired Candidate Profile

6+ years in data engineering at scale; proven delivery in production systems (regulated

domains a plus).

Expertise in Python and SQL; hands-on with Spark (EMR/Glue) and stream processing

(Kinesis/MSK/Flink/Spark Streaming).

Deep AWS experience across S3, Glue, Athena, Redshift or Aurora PostgreSQL, Lake

Formation, DMS, Lambda/ECS, Step Functions, EventBridge, SQS/SNS.

PostgreSQL mastery incl. query planning, indexing, and performance tuning; familiarity with

pgvector or OpenSearch vector search.

Strong grasp of idempotency, deduplication, CDC, schema evolution, SCDs, and contract

testing for data products.

Observability (OpenTelemetry), CI/CD, and IaC (Terraform/CDK) best practices; strong

incident response and on-call hygiene.

Security-by-design mindset: data minimization, encryption, secrets, PII-safe logging; working knowledge of GDPR and auditability.
Effective communicator across Product, Platform, Data Science, and Compliance; pragmatic,metrics-driven delivery.

Nice to Have

Experience with FHIR/HL7, German TI/ePrescription/ePA integrations.
DBT for transformations; OpenMetadata/Amundsen for catalog/lineage.
Go for high-throughput services; experience with Bedrock or SageMaker for embedding

generation.

How We Work & Benefits

API-first, clean architecture, and pairing culture; mandatory code reviews.
Remote-friendly with defined core hours; mission-led, patient-safety-first.
Ownership mindset: you build it, you run it (with sensible SLOs and error budgets).

Compliance & Notes

All PHI/PII processed within EU regions (e.g., eu-central-1); strict key management via AWS

KMS and Secrets Manager.

Right-to-erasure and lawful-basis handling embedded in data lifecycle (tombstones, purge

workflows, and immutable audit trails).

Apply Now

Jobseeker tools

Employer Tools

Browse

Stay Connected