Role : -
Design, implementation, and management of data platform, data pipelines to help fulfill data access and aggregation functionality for users in cloud environment.
Responsibilities : -
Designing, constructing, developing, installing, testing and maintaining architectures of large-scale data processing systems.
Experiment with various data models to optimize the search queries and come up with the proposals.
Build data pipelines that capture, process and store organized data into large databases which would be ready for consumption.
Expertly use Query programming languages such as SQL and functional programming languages such as Python, for mining and storing data into databases.
Perform ETL tasks on the Big Data, using highly-scalable algorithms written in Scala / Java / PySpark so that in can utilize the power distributed computing.
Have end- to- end responsibility for leading projects focused on extracting, merging, analyzing and managing large sets of data across multiple, disparate databases
Be able to transform unstructured raw data in to formats suitable for statistical modeling, visualization and machine learning environments.
Establish methodologies for quickly rolling out new data analysis capabilities for standalone data- driven products and service to support our associates.
Be able to understand varies data sources and may be able to implement automated data quality and auditing through standard tools and custom code.
Demonstrate a deep knowledge of and ability to operationalize, leading data technologies and best practices
Be responsible for maintaining project plans, clean code, and well- written documentation
Be able to work in teams and collaborate with stakeholders todefine requirements
Make decisions independently on analytical problems & methods.
Be able to identify and suggest novel areas of future work for themselves or the team
Be able to work in a globally distributed team in an Agile/ Scrum approach
Competency Requirements : -
Hands-on data engineering or other data- intensive development experience
Experience processing large amounts of structured and unstructured data
Advanced knowledge of programming languages such as Python /Java
Experience building scalable data models and performing complex relational databases queries using SQL (Oracle, MySQL), etc.
Experience with Big Data tools like Spark, Hadoop, Amazon EMR.
Experience working on Cloud platforms like AWS,IBM,Azure including EMR, RDS, EC2, Vault, ECS, S3, EFS, etc.
Desired but not essential : -
Experience in IoT time series data domain
Exposure to automotive data
Educational Qualification : -
B.Tech/M.Tech