Monday, March 19, 2012

Interesting article comparing SSIS to Oracle Web Builder

http://www.tdwi.org/News/display.aspx?ID=8100

I meant "Oracle Warehouse Builder".

D'oh!

|||

News Flash:

Oracle just acquired a highly regarded relatively new but popular ETL tool: Sunopsis, for ~1.5 B (as in billion!)

It is an interesting tool which received a high evaluation during our project's ETL tool evaluation effort. The tool is basically a sophisticated GUI that allows users to design and specifiy ETL processing using the GUI, and then spits out SQL to perform the ETL. The SQL runs on the user's RDBMS. Differening RDBM servers are accomodated by use of "templates" for each specific RDBMS product.

|||

Yeah, Sunopsis looks like a good tool. We have been talking to them lately as well. They preach the gospel of ELT rather than ETL which means they're all about using the power of the database and that is a bit closer to DTS than SSIS (that's my take on them anyway). Sunopsis is alot more adanced than DTS though.

-Jamie

|||

I am in the last stages of converting a fairly complex hand-coded SQL (by my standards anyway- it's over 40 discrete steps using all hand coded sql statements) ET process to SSIS. I had converted this ET process already to SSIS, but using staging tables where convenient, and reference tables for the lookups, all hosted in Sql Server. Processing 250,000 rows took ~14 minutes on the dev machine (the source data is local).

However I revamped the SSIS solution with the goal of removing all dependencies on an RDBMS to complete the transformations. To do this, I had to create a .net scripting transform that I now use to perform all lookups instead of the SSIS lookup component, etc. After conversion to use of all raw files for the data staging, and the .net script transform performing the lookups, it only takes ~9 minutes on the dev machine. I did not expect this gain in performance. And now we have also removed all the overhead of maintaining a Sql Server DB for the ET processing, which is a huge plus for our solution.

Having said that, I did need to create additional data validation steps. Using the staging tables it is very easy to setup all kinds of constraints on the columns, use the DB to enforce the rules during the staging inserts, and catch any errors in the OLEDB destination component and redirect them (very slow due to the one-row-at-a-time DB commits as you know, but it is easy to setup). Raw files of course do not provide such "constraint" functionality, so before I insert into them I have to do more explicit error checking and handling.

But on the other hand there also is a big advantage IMHO to not having the validation logic scattered around different tools. Now it is all in one place which I really like, and is much more self-documenting, and is therefore a more "cohesive", and "simpler" overall solution. I think it will be easier to maintain for those reasons.

So personally, although I am well trained in Sql Server design and implementation, I feel it can sometimes be a definite advantage to keep the ET processing out of the DB if possible.

|||

kenambrose wrote:

I am in the last stages of converting a fairly complex hand-coded SQL (by my standards anyway- it's over 40 discrete steps using all hand coded sql statements) ET process to SSIS. I had converted this ET process already to SSIS, but using staging tables where convenient, and reference tables for the lookups, all hosted in Sql Server. Processing 250,000 rows took ~14 minutes on the dev machine (the source data is local).

However I revamped the SSIS solution with the goal of removing all dependencies on an RDBMS to complete the transformations. To do this, I had to create a .net scripting transform that I now use to perform all lookups instead of the SSIS lookup component, etc. After conversion to use of all raw files for the data staging, and the .net script transform performing the lookups, it only takes ~9 minutes on the dev machine. I did not expect this gain in performance. And now we have also removed all the overhead of maintaining a Sql Server DB for the ET processing, which is a huge plus for our solution.

Having said that, I did need to create additional data validation steps. Using the staging tables it is very easy to setup all kinds of constraints on the columns, use the DB to enforce the rules during the staging inserts, and catch any errors in the OLEDB destination component and redirect them (very slow due to the one-row-at-a-time DB commits as you know, but it is easy to setup). Raw files of course do not provide such "constraint" functionality, so before I insert into them I have to do more explicit error checking and handling.

But on the other hand there also is a big advantage IMHO to not having the validation logic scattered around different tools. Now it is all in one place which I really like, and is much more self-documenting, and is therefore a more "cohesive", and "simpler" overall solution. I think it will be easier to maintain for those reasons.

So personally, although I am well trained in Sql Server design and implementation, I feel it can sometimes be a definite advantage to keep the ET processing out of the DB if possible.

Fascinating insights. Thanks Ken.

-Jamie

No comments:

Post a Comment