Category Archives: Db2 Ecosystem

A DB2 ETL Tool

This is going to be a really short post as it is more of a question. A request for enlightenment into what the rest of the DB2 community does in terms of getting their data from however you guys receive into your database (data warehouse). When you Google for “DB2 ETL Tools” then the first hit is on Developer Works which by the title (ETL solutions for IBM DB2 Universal Database) you think great there’s going to be a list on here, but instead it just points you too the EXPORT, LOAD and IMPORT commands and some very basic script examples of how to get these to work. The next few links are not that good either and googling around the subject you get to various products, non of which are designed to work with DB2 other than through a Java or worse ODBC driver, apart from IBM Data Stage which is very expensive.

I think to myself after seeing this then “How does everyone else do it?”. Having come from a SQL Server background with SSIS and DTS then initially I was shocked at how few tools there are for DB2, without using some verbose logging in text file and emailing you the content, this leads me to think “This is not the way the wide world can do it?”. I have over 400 ETL jobs where ~85% of them need to run everyday, therefore with another developer that has now left we deigned a way to load and record the loading of the jobs. He has now left and support for the tool he made is now becoming harder to do and even when it is it rarely goes smoothly.

So my question to the community is:

DB2 RAND()

Not done this for a while with moving house and looks like it will still be difficult with BT messing up the telephone lines in the building, but O2 are providing no end of support.  So thought I would do a short post in my lunch hour at work (actually taking a “break” for once) and do a short post.

One of the MySQL DBA colleagues asked me the other day if I could do a random update on a table, apart from the fact I could not understand at the time why you would want to randomly update a number of rows and not have an idea what you have done till you interrogate the data again, that’s another article for the what if I am allowed to publish it (doesn’t that sound intriguing!). The main IBM page in the DB2 Information Centre is here on the RAND() OLAP function.

So what’s the best way to update say 1000 rows in a 10000 row table randomly in DB2, I am not going to cover the way to do it on MySQL but heres how you could do it:

UPDATE <SCHEMA>.<TABLE> ST
SET <COLUMN> = SOMEVALUE
WHERE ST.<IDENTIFIER COLUM> = (SELECT <IDENTIFIER COLUM>
                              FROM <SCHEMA>.<TABLE>
                              ORDER BY RAND()
                              FETCH FIRST 1000 ROWS ONLY)

This code will randomly update 1000 rows, and will choose a different 1000 each time. You could but RAND() to other uses to create a random number but you will need to remember that it only generates a number between 1 and 0

So that’s it, one of my shorter ones.

IDUG – EMEA – 16th – Day Three

Today was another long day, but was ended by an excellent dinner put on by IBM to thank its customers, with ostrich leg and proper sushi so now we know where all out licensing fee goes!! The talks that I attended did not teach me as much as I had hoped, but I did learn something in each of them though so not a total waste of time.

A DBA’s guide to using TSA – Fredric Engelen

This covered the basics of HADR and then went on to cover how you set up the TSA to take over the HADR, and did not cover the TSM that I hoped it would that I will be implementing soon at Holiday extras. Learn’t:

  1. db2rfpen – Will let force the rollforward of the primary database.

Managing DB2 Performance in an Heterogeneous environment - Jim Wankowski

This covered the differences and similarities between DB2 LUW and DB2 z/OS. Although it was informative I feel the title was not correct for the session and should have been different. I learnt:

  1. When a Sort happens on a VARCHAR column then the column is expanded to its full length – I may ask this question to Scott Hayes when I do his Rocket Science Seminar on Friday

Deep Dive into DB2 LUW offline table and index reorg – Saeid Mohseni

This session was very good, if you are a frequent reader of my blog then you will know that I am trying to get a straight answer to my questions on Reorganisation and Runstats in DB2 and so I got confirmed and learnt:

  1. DB2 reorgs need the current runstats on the table to be correct to give the correct results for the reorganisation identifying stored procedure.
  2. You can parallel run a reorg on a partitioned table index as long as the first, and subsequent runnings do not allow reads.

Data Warehousing – SIG

This was a little disappointing as it did not have an agenda so was unstructured, and I would have liked to have had a little more information on how it was going to be run. It was informative and if any one has heard of “Data Vaulting” then there is a lady from the Netherlands that would really like to know.

Back to the fifties . . . . . 50 fabulous ways for forecasting failures, flaws and finding flubber – Alexander Kopac

This was an excellent talk and there is a lot to try out when I get back home and enough work to keep us going for week probably. The presenter dressed up as a wizard and the bits of SQL he has given in the slides will hopefully make the DB2 team at HX wizards.One main thing to remember is:

  1. KISS – Keep It Simple Stupid

Useful but widely unknown DB2 Functions – Michael Tiefenbacher

Second talk from this guy and if I did not already know, used or have blogged about all the things that he presented this would have been an extreamly useful and I really should have read the Agenda better before getting in there.
And to the final talk of the day:

DB2 LUW Index design, best practice and case studies – Scott Hayes

This was a very good talk and used in conjunction with Alexander’s information I think will build a framework for reviewing indexes and designs at HX. I learnt that:
  1. I need to read up on CLUSTERED indexes
  2. Single column indexes are not good, even though it is the recommended by IBM
  3. You need a good problem statement to come up with a good solution – Can be applied to everything in life.
Tomorrow is the last day of the conference and so it finishes pretty early and so I might get some sight seeing done in the afternoon, but before that I plan on attending:

Thursday, November 17, 2011

08:30 AM – 09:30 AM
Session 15
09:45 AM – 10:45 AM
Session 16
11:15 AM – 12:15 PM
Session 17
12:30 PM – 01:30 PM
Thursday DB2 Panel
So have a good night and see you all in the morning.