Big Data Integration

Talend, ETL

I worked on this Big Data Integration project in a private bank as Data Engineer. The existing data warehouse has limitation to their business so they would like to change to Cloudera-Hadoop. To do this, they requires end-to-end data integration and data governance to do data processing from data source systems into Cloudera. Talend is used to support the data integration process. Furthermore, Cloudera-Hadoop will be a centralized bank-wide big data repository to support analytics. Data integration process is started by transferring data from data source to Cloudera cluster/data lake. Data is transferred by using Sqoop Process which is managed by Talend and it uses direct connection. Once data is already stored in Cloudera, Talend will pull the data (in the form of HDFS file) and do the process of data integration. After the process is done the data will be sent back to Cloudera.

Free HTML5 by FreeHTML5.co

Data integration process in this project includes four main processes (labeled with number):
  1. Trigger process
  2. Sqoop Process
  3. Format Conversion process
  4. Data integration process