Data Engineer-
Mandatory Skill-
1. Apache Spark Expertise in distributed data processing, Spark SQL, and
performance tuning.
2. Airflow / Workflow Orchestration Ability to design and manage complex
ETL pipelines with dependency handling.
3. BigQuery, SQL & PostGres Strong querying skills and experience with
large-scale data analytics.
4. Python Proficient in writing ETL scripts, data validation, and
automation.
5. Linux & Git Comfortable with command-line tools and version control
for data workflows.
- Experience with cloud services like Dataproc, Composer, Pub/Sub.
- Knowledge of Parquet and columnar data formats.
- Familiarity with data quality monitoring and alarm configuration.
- Analytical mindset and attention to detail.
- Ability to document processes and communicate findings clearly.
- Experience in schema evolution and inventory management.
- Debugging skills for complex data pipelines.
Strong bilingual communication and documentation habits.
- Actively participate in requirements gathering and analysis of business
processes to determine technical feasibility.
- Design and maintain ETL workflows using Airflow or similar tools.
- Develop Spark jobs for data transformation and optimization.
- Validate input and output data for quality and consistency.
- Configure and monitor data quality alarms and thresholds.
- Collaborate with QA and other engineers for testing and
troubleshooting.
- Maintain documentation and ensure reproducibility of data processes./"