Monthly Archives: June 2012

You are browsing the site archives by month.

A DB2 ETL Tool

This is going to be a really short post as it is more of a question. A request for enlightenment into what the rest of the DB2 community does in terms of getting their data from however you guys receive into your database (data warehouse). When you Google for “DB2 ETL Tools” then the first hit is on Developer Works which by the title (ETL solutions for IBM DB2 Universal Database) you think great there’s going to be a list on here, but instead it just points you too the EXPORT, LOAD and IMPORT commands and some very basic script examples of how to get these to work. The next few links are not that good either and googling around the subject you get to various products, non of which are designed to work with DB2 other than through a Java or worse ODBC driver, apart from IBM Data Stage which is very expensive.

I think to myself after seeing this then “How does everyone else do it?”. Having come from a SQL Server background with SSIS and DTS then initially I was shocked at how few tools there are for DB2, without using some verbose logging in text file and emailing you the content, this leads me to think “This is not the way the wide world can do it?”. I have over 400 ETL jobs where ~85% of them need to run everyday, therefore with another developer that has now left we deigned a way to load and record the loading of the jobs. He has now left and support for the tool he made is now becoming harder to do and even when it is it rarely goes smoothly.

So my question to the community is:

db2advis command an idiots guide

This article was prompted by the fact that the new Express-C version of DB2 uninstalls the much maligned Control Center and installs the IBM Data Studio that so far, after much reading of the reading materials, I struggle to get to run on a VM running 4 CPU’s of an 2GHz i7 Processor and 5Gb of RAM. So after realising that when you try and run the tuning part of of the application unless you have the appropriate licences then it only ever recommends runstats. Unless someone out there know why?

Therefore I needed a different way of running the design advisor to see what DB2 thinks it can do to optimise the query, which left me with my not so favorite environment the command line! So all you need to know about design advisor command (db2advis) with all the different options can be found here at the IBM Infocenter, what I am going to do is take you through what I run most often.

The command is:

db2advis

At the minimum you need to run

db2advis -d <Database name> -s "<SQL Statement>"

This will then output to the terminal window like this:

Using user id as default schema name. Use -n option to specify schema
execution started at timestamp 2012-06-19-17.31.31.666368
Recommending indexes...
 0 indexes in current solution
 [974.0000] timerons (without recommendations)
 [974.0000] timerons (with current solution)
 [0.00%] improvement

--
--
-- LIST OF RECOMMENDED INDEXES
-- ===========================
-- no indexes are recommended for this workload.

--
--
-- RECOMMENDED EXISTING INDEXES
-- ============================

--
--
-- UNUSED EXISTING INDEXES
-- ============================
-- ===========================
--
-- ====ADVISOR DETAILED XML OUTPUT=============
-- ==(Benefits do not include clustering recommendations)==
--
--<?xml version="1.0"?>
--<design-advisor>
--<statement>
--<statementnum>1</statementnum>
--<statementtext>
-- SELECT * FROM <SCHEMA>.<TABLE NAME>
--</statementtext>
--<objects>
--<identifier>
--<name><TABLE NAME></name>
--<schema><SCHEMA> </schema>
--</identifier>
--</objects>
--<benefit>0.000000</benefit>
--<frequency>1</frequency>
--</statement>
--</design-advisor>
-- ====ADVISOR DETAILED XML OUTPUT=============
--
0 solutions were evaluated by the advisor
DB2 Workload Performance Advisor tool is finished.

On a query that did need optimising then handily the tool will give you an expected improvement and a list of actions to carry out to hopefully achieve this in terms of extra indexes and runstats.

GOTCHA Alerts:

1. The section under “Unused Existing Indexes” normally contains a whole list of drop indexes as it did not use them in the query, do not blindly do this! Please consider what it is suggesting and the obvious ramifications before carrying out these actions.

2. The indexes it suggests for creation under “List of Recommend Indexes” does not take into account for existing indexes that may have say three out of the four columns needed, so rather than create another index increasing your storage footprint, increasing insert and update times, etc then please consider extending existing indexes to the fourth column, rather than creating another index.

3. The names of the indexes suggested for creation can be changed, and I do as they come out with a name that is unintelligible to man nor beast at a later date on what columns might be in there.

4. The Schema of the index recommended to be created will be in whatever scema that is defined what running the command. The assumed default is the user that you are running as, which might not be where you want it.

So now you have run it for the first time then you want to get a little more complicated and you have a really long multi lined statement that you want to tune, well you can do that easily by passing the parameters for the command. You always have to pass the database (-d <DATABASE NAME>) parameter, you can then follow it with the -i parameter which tells it that it is taking an input file this can be called anything. This file that you are passing as an input needs to have a ; at the end of the statement. The other thing you can do is output what is normally returned to the terminal to a file using the normal Linux channels. You can limit the time DB2 considers what it is going to recommend by the -t parameter that, this is in minutes. I usually run something like this:

db2advis -d <DATABASE NAME> -i <INPUT FILENAME> -t <TIME (Mins)> > <OUTPUT FILENAME>

GOTCHAS:

1. Make sure if you want to see the improvements any changes you have carried out have made you need to make sure you take a copy of the output file, or call the file something else.

2. The advice that is given out is only for indexes, if you want other advice like MQT’s or MDC’s then you have to specify a further option of -m

As stated above if you want other advice then you need the -m option then a series of letters. Indexes is assumed but this is I, M recommends new MQT’s that could be used in optimising the query. C will look to see if a MDC or clustering index, finally P will recommend a different field to partition your table over. These can be used like -m IMCP, one or none can be used so your command would look a little like

db2advis -d <DATABASE NAME> -i <INPUT FILENAME> -t <TIME (Mins)> -m <Options M,C,P,I> > <OUTPUT FILENAME>

Some of you may be limited for disk space so the -l <MB Limit Size> option can be quite useful to keep the size down, especially if you have specified the -m M (consider advising MQT’s). Therefore an extended, and most of the options that I usually use would be:

db2advis -d <DATABASE NAME> -i <INPUT FILENAME> -t <TIME (Mins)> -m <Options M,C,P,I> -l <Size in Mb> > <OUTPUT FILENAME>

Or something like I would run

db2advis -d SAMPLE -i TROUBLE_QUERY_IN -t 5 -m ICM -l 1000 > TROUBLE_QUERY_OUT_1

I hope this has been informative and there are loads more options that can be used, full list here.

Can’t Believe DB2 does not have this function – Convert seconds to hours, minutes and seconds

For the third in the series and possibly the last I am going to look at the little doosie that was presented to me the other day where the seconds of time passed from a system are presented as an integer and needed converting to a time, when I say a time I mean a time that will add up in MS Excel where the hour part could be over 24, so not strictly a valid DB2 time.

We have a new VOIP phone system here at work and apart from all the times (call start, call end, logged on, logged off) being in the number of seconds from 01/01/1970 (Unix time) so DB2 can handle this very well adding the timestamp of 01/01/1970 with the seconds and you get the correct timestamp of the call. The other interesting thing it does is it records all call times in seconds and this needs converting to hh:mm:ss for reporting purposes. To do this I was reminded by a colleague of the MOD function that DB2 has. MOD returns the remainder as an integer, opposed to the number of times the value will dived.

So you can do something like this to get the hours, minuits and seconds as a VALUES statement where <INTEGER> is the total seconds:

VALUES TRIM(VARCHAR((<INTEGER> / 3600))) || 'h' || TRIM(VARCHAR((MOD(<INTEGER>,3600)) / 60)) || 'm' || TRIM(VARCHAR(MOD(MOD(<INTEGER>,3600 / 60), 60))) || 's'

If you substituted <INTEGER> for 3681 it returns the result:

 1       
 ------- 
 1h1m21s

Which as far as I can work out is correct. There are 3600 seconds in an hour, so to work out hours then you divide seconds by 3600, to get minuits you use MOD to get the remainder of the passed in value and then divide by sixty (the result of MOD will still be in seconds) the result of the division because it is an integer will be returned as an integer so there are some second s remaining. To work out the seconds remaining then you need to MOD the value from the working out the number of minuits to get the seconds.

So then the next step is too produce a function that returns a time:

CREATE FUNCTION GLOBAL.GET_TIME (IN IN_SECONDS INTEGER)
DETERMINISTIC
NO EXTERNAL ACTION
RETURNS VARCHAR(10)
LANGUAGE SQL
BEGIN ATOMIC
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Phil C - 14/06/2012
    --Takes in an integer and returns a varchar representation of a time for 
    -- use in excel as opposed to anything beging a valid DB2 time type
    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Create some vars
    DECLARE OUT_TIME VARCHAR(10);
    DECLARE HOUR_PART INTEGER;
    DECLARE MIN_PART INTEGER;
    DECLARE SEC_PART INTEGER;
    DECLARE HOUR_PART_V VARCHAR(5);
    DECLARE MIN_PART_V VARCHAR(3);
    DECLARE SEC_PART_V VARCHAR(2);

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Set some vars
    SET HOUR_PART = (IN_SECONDS / 3600);
    SET MIN_PART = (MOD(IN_SECONDS,3600) / 60);
    SET SEC_PART = (MOD(MOD(IN_SECONDS,3600 / 60), 60));

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Fattern the values out
    IF (HOUR_PART < 10) THEN
        SET HOUR_PART_V = '0' || TRIM(VARCHAR(HOUR_PART)) || ':';
    ELSE
        SET HOUR_PART_V = TRIM(VARCHAR(HOUR_PART)) || ':';
    END IF;

    IF (MIN_PART < 10) THEN
        SET MIN_PART_V = '0' || TRIM(VARCHAR(MIN_PART)) || ':';
    ELSE
        SET MIN_PART_V = TRIM(VARCHAR(MIN_PART)) || ':';
    END IF;

    IF (SEC_PART < 10) THEN
        SET SEC_PART_V = '0' || TRIM(VARCHAR(SEC_PART));
    ELSE
        SET SEC_PART_V = TRIM(VARCHAR(SEC_PART));
    END IF;

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Create the end value
    SET OUT_TIME = (HOUR_PART_V || MIN_PART_V || SEC_PART_V);

    ----------------------------------------------------------------------------
    ----------------------------------------------------------------------------
    --Return the value
    RETURN OUT_TIME;

END

So this will return a time like result up to 9999:59:59 and you will be able to export a result set to Excel or the like and then it can be added up. DB2 will not accept this as a time! So when you run this:

VALUES GLOBAL.GET_TIME(3681)

Returns

 1        
 -------- 
 01:01:21

Which is the same result as at the start so I think the function is a good one. Till next time happy UDF’ing