Welcome!

Wednesday, September 8, 2010

How to Implement ETL Process using Talend + SpagoBI? (2)


3. Config ETL Job from SpagoBI


1) In SpagoBI web console, click "User menu" --> "Documents execution":

We found that the newly deployed ETL job is already there:

This is because the following settings in the file "engine-config.xml" at the folder "tomcat_home/webapps/SpagoBITalendEngine/WEB-INF/classes/":

<spagobi_autopublish>true</spagobi_autopublish>
<spagobi_functionality_label>ETL</spagobi_functionality_label>


2) Create a "Analytical Driver" for the context variable (parameter) required by the ETL job


And edit the ETL job detail to associate the analytical driver with the context variable "v_lastname" (need to be exactly the same with that defined in ETL job):

3) Save the changes.


4. Execute the ETL Job within SpagoBI

Now, after the settings are done, we can execute the ETL job from within SpagoBI server

Fill the value for "v_lastname" and execute the job:

ETL process started:

For more detail about the executed ETL job, we can check it's event ("User menus" --> "Events"):

Or, by checking the log file at: tomcat_home/logs/SpagoBITalendEngine.log


5. Schedule the ETL Job using SpagoBI Scheduler

1) Click "User menus" --> "Documents execution" and modify ETL document detail

2) Change the document state to "Released"
Save the change.

3) Click menu "Tools" --> "Scheduler", and click the button "New Activity" to add a new activity:

Also set the way of how to assign values to parameter, then save the activity.

4) Schedule the activity

And create a new schedule for this ETL activity, save the settings, and exit.
So far, the ETL job is scheduled to run.

5) Check the execution of scheduled job.
Click "User menu" --> "Events" to see events lists, also can check the details of start or end execution of scheduled job.



(complete)

How to Implement ETL Process using Talend + SpagoBI? (1)

Talend Open Studio is an open source data integration design tool. It can be used to solve these challenges in complicate enterprise environment: database synchronization or replication; data exchange; ETL (Extract Transform Load) for analytics; Data migration; Complex data transformation and loading; Data quality.

SpagoBI has an ETL engine based on Talend, called SpagoBITalendEngine. The SpagoBI environment provides an server context to schedule or execute ETL process application (which is exported from Talend open studio).

In this short tutorial, I demonstrate using Talend open studio (TOS) to design a simple ETL job to extract data from two Oracle tables, perform data transformation, and load these data into one MySQL table. After the ETL job is tested on TOS, the final job script is exported. The exported job script is a standalone application (in Java or Perl), which contains all the transformation logic and metadata information, and can be executed out side of TOS.

In the following demo, the ETL job is exported/deployed into SpagoBI server, using the deploy tool provided by TOS. On SpagoBI, an ETL document is created and tested, then, it can be released to SpagoBI scheduler.


Environment
============
Java: JDK 1.6.0_20
SpagoBI: version 2.6
Database 1: Oracle 11g R2 + OE (Order Entry) schema
Database 2: MySQL 5.1.48
App Server: Tomcat 6.0.28
Talend: Open Studio 4.0.2


1. Design ETL Process using Talend Open Studio (TOS)

1) Download and install TOS from http://www.talend.com/; for how to use TOS , please see TOS user manual for detail.

2) Define the data source (Oracle) connection and data target (MySQL) connection in the repository of TOS, as shown below:


and "Retrieval Schema" to find out the two source tales ("ORDERS", "CUSTOMERS") and one target table ("ORDR_CUST");

3) Design the ETL flow


and config the tMap component to embed the transformation logic


4) Create a context variable "v_lastname" to restrict the data retrieved from table "CUSTOMERS"




5) Make sure that both the source and target databases are running. Run the ETL job from within the TOS. Check the target database (MySQL) table "ORDR_CUST" to verify that it has been populated or refreshed.

So far so good, the ETL job is designed and tested good on TOS.
From the next step, we will deploy the ETL job to SPagoBI server.


2. Deploy to SpagoBI Server

1) Make sure that SpagoBI server is running.

2) Config the TOS deploy tool
On TOS, select "Windows" --> "Preferences"


3) Run the TOS deploy tool



4) ETL job is successfully deployed to SpagoBI server.


(to be continued)