IND - Senior Staff Engineer, Reliability - GCC071
We’re determined to make a difference and are proud to be an insurance company that goes well beyond coverages and policies. Working here means having every opportunity to achieve your goals – and to help others accomplish theirs, too. Join our team as we help shape the future.
Data Observability: Implement advanced data observability tools to monitor the entire data journey—from ingestion to consumption—detecting data quality anomalies, schema drifts, and pipeline delays in real-time.
Pipeline Resiliency & Automation: Collaborate with Data Engineering to embed reliability patterns into data pipelines built using Informatica, Python/Pyspark, and running on platforms like Amazon EMR/Hadoop, Informatica and cloud native services.
Toil Elimination in Data Operations: Automate data validation, data reprocessing, data backfilling, and other manual operational tasks within the data lifecycle to reduce toil and improve operational efficiency.
Incident and Problem Management (Data Focus): Lead the response and resolution for data-related incidents (e.g., corrupt data, delayed reporting), ensuring fast recovery and effective post-incident reviews (blameless post-mortems).
Runbook Creation & Automation (Data Focus): Develop and automate sophisticated, data-aware runbooks for common data pipeline failures, data quality issues, and data recovery scenarios.
Required Skills & Experience
Expert level, hands-on experience with data warehousing and data lake technologies, including Snowflake, and cloud environments (AWS/GCP).
Hands on experience in software or cloud engineering. Familiarity with cloud service providers and their core capabilities (compute, containers, databases, APIs etc.).
In depth and hands on experience with data observability concepts and tools for monitoring data in motion and at rest (e.g., Monte Carlo, Bigeye, Astro Observe, Datafold, custom solutions).
Experience with prompt engineering, implementing AWS or Google AI services, AI enabled automation for data quality, reliability and pipeline performance management.