. Design & Develop Data Pipelines: Build robust, scalable, and efficient data ingestion pipelines from various sources (e.g., databases, APIs, streaming data, files) into our
data lake/warehouse, with a focus on ingesting data into BigQuery.
2. Ensure Data Quality: Implement and maintain data quality checks, validation rules, and monitoring mechanisms to ensure accuracy, completeness, and consistency of data,
particularly within BigQuery tables. Identify and resolve data anomalies and inconsistencies proactively.
3. Data Modeling: Collaborate with data analysts and data scientists to understand data requirements and design optimal data models for analytics and reporting within
BigQuery (e.g., partitioned and clustered tables, views, external tables).
4. Performance Optimization: Optimize existing data pipelines and BigQuery queries for performance and cost-efficiency, leveraging BigQuery features like partitioning,
clustering, and query optimization techniques.
5. Automation: Automate data extraction, transformation, and loading (ETL/ELT) processes, potentially utilizing GCP services like Cloud Functions, Cloud Dataflow, or
Cloud Composer (Apache Airflow) for orchestration.
6. Documentation: Create and maintain comprehensive documentation for data pipelines, data models, and data quality standards.
7. Troubleshooting & Support: Provide support for data-related issues, debug pipeline failures, and ensure timely resolution, including troubleshooting BigQuery job failures
and performance issues.
8. Collaboration: Work closely with cross-functional teams including product, engineering, and business intelligence to understand data needs and deliver effective data solutions.
9. Stay Current: Research and evaluate new data technologies and tools to improve our data infrastructure and processes, especially within the GCP ecosystem.
What You'll Bring:
- Bachelor's degree in Computer Science, Engineering, Information Technology, or a related quantitative field.
2. 4+ years of professional experience in data engineering or a similar role, with a strong
focus on building and maintaining data pipelines.
3. Proficiency in at least one programming language commonly used in data engineering
(e.g., Python, Java, Scala). Python is highly preferred.
4. Strong experience with analytical engines like Apache Pinot and Elasticsearch..
5. Hands-on experience with ETL/ELT tools and concepts.
6. Proven experience with Google Cloud Platform (GCP) data services, specifically
BigQuery.
7. Competency in loading data into BigQuery using various methods (e.g., batch
loading from Cloud Storage, streaming inserts).
8. Strong SQL querying skills within BigQuery, including understanding of
BigQuery's unique SQL dialect and functions.
9. Familiarity with BigQuery table optimizations like partitioning and clustering. ○
Experience with BigQuery's data quality features or applying data quality best
practices to BigQuery data.
10. Familiarity with data warehousing concepts and experience with data lake architectures.
11.Understanding of data quality principles and experience implementing data validation
techniques.
12. Experience with version control systems (e.g., Git).
13. Excellent problem-solving skills and attention to detail.
14. Strong communication and interpersonal skills, with the ability to explain complex
technical concepts to non-technical stakeholders.
Bonus Points If You Have:
15. Experience with other GCP data services such as Cloud Dataflow, Cloud Composer
(Apache Airflow), Cloud Storage, Pub/Sub, or Dataproc.
16. Familiarity with BigQuery ML or other machine learning concepts.
17. Experience with data orchestration tools (e.g., Apache Airflow, Prefect, Dagster).
18. Knowledge of data governance and data security best practices within GCP.
interested share your resume to [email protected]/8688322632
Pay: ₹510,440.24 - ₹1,868,453.81 per year
Work Location: In person