Tag Archives: Infosphere

A DB2 ETL Tool

This is going to be a really short post as it is more of a question. A request for enlightenment into what the rest of the DB2 community does in terms of getting their data from however you guys receive into your database (data warehouse). When you Google for “DB2 ETL Tools” then the first hit is on Developer Works which by the title (ETL solutions for IBM DB2 Universal Database) you think great there’s going to be a list on here, but instead it just points you too the EXPORT, LOAD and IMPORT commands and some very basic script examples of how to get these to work. The next few links are not that good either and googling around the subject you get to various products, non of which are designed to work with DB2 other than through a Java or worse ODBC driver, apart from IBM Data Stage which is very expensive.

I think to myself after seeing this then “How does everyone else do it?”. Having come from a SQL Server background with SSIS and DTS then initially I was shocked at how few tools there are for DB2, without using some verbose logging in text file and emailing you the content, this leads me to think “This is not the way the wide world can do it?”. I have over 400 ETL jobs where ~85% of them need to run everyday, therefore with another developer that has now left we deigned a way to load and record the loading of the jobs. He has now left and support for the tool he made is now becoming harder to do and even when it is it rarely goes smoothly.

So my question to the community is:

Just finished reading: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data

I have just finished reading this book, I was excited about the IBM offering and the concepts around big data at IDUG, but after reading the book I want to find a project I can try this out on. The book can be downloaded from here: Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data.

The book is in two parts, Part 1: Big Data from the business prospective and Part 2: Big Data from the technology prospective. The first part of the book as it suggests does not touch on the technical aspects of big data only the benefits to businesses and how we all are already part of the Big Data world. The second part of the book explains at a high level all the different parts of the Hadoop cluster and how you get data in and out and process data in there. The second part also explains the IBM offering into this marketplace in the form of IBM InfoSphere BigInsights and Streams.

The as a high level description first part introduces the concept of the three V’s of big data, Volume, Velocity and Variety, the uses of these V’s in a number of different scenarios all of which are very interesting and I can easily see how it would bring you competitive advantage (probably the point of the case studies). The second part is for the techies explaining what Hadoop is and all of the different parts that make it up with MapReduce, common components and the file system. Also explaining all the other technologies surrounding Big Data such as Hive, Flume and Jaql.

So this is just a very light overview of the book, and well worth a read. I did it on my kindle, sometimes the text varies from page to page as it gets resized but overall it was fine.