First post on the new home of my blog

This is the first post on my old blog hosted in the new location. I have a new project in the pieline, this will hopefully be a new place for more of my DB2 projects and musing, and this blog can be reverted to its orignal purpose of being more of a personal blog. Thats not to mean that I wont be still putting some DB2 stuff on there but less so.

A bit of history of my blog is that I started it as a bit of an outlet for rant and musings to let off some steam, but then as my DB2 DBA career developed then the blog got more and more DB2 stuff has gone on there, and thats why you guys visit. But to keep it consistant and so I can get this one back to its original then I am going to start a dedicated DB2 blog.

So keep your eyes on Twitter and on the blog for announcements in the next few days.

A DB2 ETL Tool

This is going to be a really short post as it is more of a question. A request for enlightenment into what the rest of the DB2 community does in terms of getting their data from however you guys receive into your database (data warehouse). When you Google for “DB2 ETL Tools” then the first hit is on Developer Works which by the title (ETL solutions for IBM DB2 Universal Database) you think great there’s going to be a list on here, but instead it just points you too the EXPORT, LOAD and IMPORT commands and some very basic script examples of how to get these to work. The next few links are not that good either and googling around the subject you get to various products, non of which are designed to work with DB2 other than through a Java or worse ODBC driver, apart from IBM Data Stage which is very expensive.

I think to myself after seeing this then “How does everyone else do it?”. Having come from a SQL Server background with SSIS and DTS then initially I was shocked at how few tools there are for DB2, without using some verbose logging in text file and emailing you the content, this leads me to think “This is not the way the wide world can do it?”. I have over 400 ETL jobs where ~85% of them need to run everyday, therefore with another developer that has now left we deigned a way to load and record the loading of the jobs. He has now left and support for the tool he made is now becoming harder to do and even when it is it rarely goes smoothly.

So my question to the community is:

db2advis command an idiots guide

This article was prompted by the fact that the new Express-C version of DB2 uninstalls the much maligned Control Center and installs the IBM Data Studio that so far, after much reading of the reading materials, I struggle to get to run on a VM running 4 CPU’s of an 2GHz i7 Processor and 5Gb of RAM. So after realising that when you try and run the tuning part of of the application unless you have the appropriate licences then it only ever recommends runstats. Unless someone out there know why?

Therefore I needed a different way of running the design advisor to see what DB2 thinks it can do to optimise the query, which left me with my not so favorite environment the command line! So all you need to know about design advisor command (db2advis) with all the different options can be found here at the IBM Infocenter, what I am going to do is take you through what I run most often.

The command is:

db2advis

At the minimum you need to run

db2advis -d <Database name> -s "<SQL Statement>"

This will then output to the terminal window like this:

Using user id as default schema name. Use -n option to specify schema
execution started at timestamp 2012-06-19-17.31.31.666368
Recommending indexes...
 0 indexes in current solution
 [974.0000] timerons (without recommendations)
 [974.0000] timerons (with current solution)
 [0.00%] improvement

--
--
-- LIST OF RECOMMENDED INDEXES
-- ===========================
-- no indexes are recommended for this workload.

--
--
-- RECOMMENDED EXISTING INDEXES
-- ============================

--
--
-- UNUSED EXISTING INDEXES
-- ============================
-- ===========================
--
-- ====ADVISOR DETAILED XML OUTPUT=============
-- ==(Benefits do not include clustering recommendations)==
--
--<?xml version="1.0"?>
--<design-advisor>
--<statement>
--<statementnum>1</statementnum>
--<statementtext>
-- SELECT * FROM <SCHEMA>.<TABLE NAME>
--</statementtext>
--<objects>
--<identifier>
--<name><TABLE NAME></name>
--<schema><SCHEMA> </schema>
--</identifier>
--</objects>
--<benefit>0.000000</benefit>
--<frequency>1</frequency>
--</statement>
--</design-advisor>
-- ====ADVISOR DETAILED XML OUTPUT=============
--
0 solutions were evaluated by the advisor
DB2 Workload Performance Advisor tool is finished.

On a query that did need optimising then handily the tool will give you an expected improvement and a list of actions to carry out to hopefully achieve this in terms of extra indexes and runstats.

GOTCHA Alerts:

1. The section under “Unused Existing Indexes” normally contains a whole list of drop indexes as it did not use them in the query, do not blindly do this! Please consider what it is suggesting and the obvious ramifications before carrying out these actions.

2. The indexes it suggests for creation under “List of Recommend Indexes” does not take into account for existing indexes that may have say three out of the four columns needed, so rather than create another index increasing your storage footprint, increasing insert and update times, etc then please consider extending existing indexes to the fourth column, rather than creating another index.

3. The names of the indexes suggested for creation can be changed, and I do as they come out with a name that is unintelligible to man nor beast at a later date on what columns might be in there.

4. The Schema of the index recommended to be created will be in whatever scema that is defined what running the command. The assumed default is the user that you are running as, which might not be where you want it.

So now you have run it for the first time then you want to get a little more complicated and you have a really long multi lined statement that you want to tune, well you can do that easily by passing the parameters for the command. You always have to pass the database (-d <DATABASE NAME>) parameter, you can then follow it with the -i parameter which tells it that it is taking an input file this can be called anything. This file that you are passing as an input needs to have a ; at the end of the statement. The other thing you can do is output what is normally returned to the terminal to a file using the normal Linux channels. You can limit the time DB2 considers what it is going to recommend by the -t parameter that, this is in minutes. I usually run something like this:

db2advis -d <DATABASE NAME> -i <INPUT FILENAME> -t <TIME (Mins)> > <OUTPUT FILENAME>

GOTCHAS:

1. Make sure if you want to see the improvements any changes you have carried out have made you need to make sure you take a copy of the output file, or call the file something else.

2. The advice that is given out is only for indexes, if you want other advice like MQT’s or MDC’s then you have to specify a further option of -m

As stated above if you want other advice then you need the -m option then a series of letters. Indexes is assumed but this is I, M recommends new MQT’s that could be used in optimising the query. C will look to see if a MDC or clustering index, finally P will recommend a different field to partition your table over. These can be used like -m IMCP, one or none can be used so your command would look a little like

db2advis -d <DATABASE NAME> -i <INPUT FILENAME> -t <TIME (Mins)> -m <Options M,C,P,I> > <OUTPUT FILENAME>

Some of you may be limited for disk space so the -l <MB Limit Size> option can be quite useful to keep the size down, especially if you have specified the -m M (consider advising MQT’s). Therefore an extended, and most of the options that I usually use would be:

db2advis -d <DATABASE NAME> -i <INPUT FILENAME> -t <TIME (Mins)> -m <Options M,C,P,I> -l <Size in Mb> > <OUTPUT FILENAME>

Or something like I would run

db2advis -d SAMPLE -i TROUBLE_QUERY_IN -t 5 -m ICM -l 1000 > TROUBLE_QUERY_OUT_1

I hope this has been informative and there are loads more options that can be used, full list here.