Welcome!

Wednesday, September 8, 2010

How to Implement ETL Process using Talend + SpagoBI? (1)

Talend Open Studio is an open source data integration design tool. It can be used to solve these challenges in complicate enterprise environment: database synchronization or replication; data exchange; ETL (Extract Transform Load) for analytics; Data migration; Complex data transformation and loading; Data quality.

SpagoBI has an ETL engine based on Talend, called SpagoBITalendEngine. The SpagoBI environment provides an server context to schedule or execute ETL process application (which is exported from Talend open studio).

In this short tutorial, I demonstrate using Talend open studio (TOS) to design a simple ETL job to extract data from two Oracle tables, perform data transformation, and load these data into one MySQL table. After the ETL job is tested on TOS, the final job script is exported. The exported job script is a standalone application (in Java or Perl), which contains all the transformation logic and metadata information, and can be executed out side of TOS.

In the following demo, the ETL job is exported/deployed into SpagoBI server, using the deploy tool provided by TOS. On SpagoBI, an ETL document is created and tested, then, it can be released to SpagoBI scheduler.


Environment
============
Java: JDK 1.6.0_20
SpagoBI: version 2.6
Database 1: Oracle 11g R2 + OE (Order Entry) schema
Database 2: MySQL 5.1.48
App Server: Tomcat 6.0.28
Talend: Open Studio 4.0.2


1. Design ETL Process using Talend Open Studio (TOS)

1) Download and install TOS from http://www.talend.com/; for how to use TOS , please see TOS user manual for detail.

2) Define the data source (Oracle) connection and data target (MySQL) connection in the repository of TOS, as shown below:


and "Retrieval Schema" to find out the two source tales ("ORDERS", "CUSTOMERS") and one target table ("ORDR_CUST");

3) Design the ETL flow


and config the tMap component to embed the transformation logic


4) Create a context variable "v_lastname" to restrict the data retrieved from table "CUSTOMERS"




5) Make sure that both the source and target databases are running. Run the ETL job from within the TOS. Check the target database (MySQL) table "ORDR_CUST" to verify that it has been populated or refreshed.

So far so good, the ETL job is designed and tested good on TOS.
From the next step, we will deploy the ETL job to SPagoBI server.


2. Deploy to SpagoBI Server

1) Make sure that SpagoBI server is running.

2) Config the TOS deploy tool
On TOS, select "Windows" --> "Preferences"


3) Run the TOS deploy tool



4) ETL job is successfully deployed to SpagoBI server.


(to be continued)

1 comment:

  1. Why you did not follow the exercise. this very interesting. it is possible that you finish it

    ReplyDelete