Monthly Archives: January 2011

You are browsing the site archives by month.

Automated DB2 Reorganisation

This code has now been replaced with Version 2

At my workplace there is no time for the automated reorganisation through the automated maintenance processes, and there is a need for a little bit of intelligence when it comes to actually carrying out the reorgs. DB2 9.7 comes with an inbuilt command that will allow you to assess the tables and if they actually need reorganisation, more info can be found here but a brief overview of the command is below:

CALL SYSPROC.REORGCHK_TB_STATS(<S or T>, <Schema Name or Table Name>)

So calling this will assess a table or tables in a schema and used in combination with the sessions table can be combined to powerfully assess the tables that need reorganisation. The table that will be available in your session is below:

SELECT *
FROM SESSION.TB_STATS

The tables that will need reorganisation will be those where the column REORG has a * in it. The REORG column is a CHAR(3) and if it contains a * then the table has failed one of the assessments the stored procedure carries out. So using the following will identify them:

SELECT *
FROM SESSION.TB_STATS 
WHERE REORG LIKE '%*%'

The other thing that this command and table allows you to see is the partitions on the partition tables separately to the overall table this allows you to do clever things with running online and offline reorganisation. Equally by identifying the partitions that need reorging means that although you can only do these offline, it does mean there is a lot less to do and so should work quicker.

Below is the code that I have come up with and tested on my works database and is currently in the production system performing maintenance every Sunday. First up is a note to compile this code you are going to have to fool DB2 into thinking that the table SESSION.TB_STATS is actually available, to do this you will have to run the SP SYSPROC.REORGCHK_TB_STATS with viable values in. I have written the following code to be a wrapper and simple to use, but the elements into can be reused without the wrapper.

To accomplish the automated reorganisation I have designed a process that involves a table, a view and a number of stored procedures. It could be done with less but being cautious I like to record everything that goes on for auditing purposes. I am going to post the code in reverse order, first of all the wrapper SP that is called :

DROP PROCEDURE DB_MAIN.AUTOMATED_REORG(VARCHAR(1), VARCHAR(255))
GO
CREATE PROCEDURE DB_MAIN.AUTOMATED_REORG(IN IN_RUN_FLAG VARCHAR(1), IN IN_SCHEMA_TABLE VARCHAR(255))
DYNAMIC RESULT SETS 1
LANGUAGE 
SQLBEGIN    
----------------------------------------------------------------------------------------------    
--Description: For the supplied table or schema it will chose the appropriate action and assess    
--             if the table / schema need any reorging or not.    
--Parameters: IN_RUN_FLAG: T if you want a table assessing, S if you want a schema    
--            IN_SCHEMA_TABLE: A schema and table (e.g. CHIPS.TRANSACTIONS) or just a schema (e.g. CHIPS)    
--Invocation Examples: To run a schema: CALL DB_MAIN.REORG_TABLE_SCHEMA('S', 'CHIPS')    
--                     To run a table: CALL DB_MAIN.REORG_TABLE_SCHEMA('T', 'CHIPS.TRANSACTIONS')    
--Created by Philip Carrington for more info mail philip dot carrington at gmail dot com    
----------------------------------------------------------------------------------------------
 IF (IN_RUN_FLAG='S') THEN        
--Assess a Schema        
    CALL DB_MAIN.AUTOMATED_REORG_SCHEMA(IN_SCHEMA_TABLE);    
ELSE        
--Assess a Table        
    CALL DB_MAIN.AUTOMATED_REORG_TABLE(IN_SCHEMA_TABLE);    
END IF;
END

As you can see from the code the stored procedure then calls two further stored procedures, these do the identification of the tables or the partitions that need reorging, below is one of the Stored Procedures the other can be found in the downloadable file:

DROP PROCEDURE DB_MAIN.AUTOMATED_REORG_SCHEMA(VARCHAR(255))
GO
CREATE PROCEDURE DB_MAIN.AUTOMATED_REORG_SCHEMA(IN IN_SCHEMA_TABLE VARCHAR(255))
DYNAMIC RESULT SETS 1
LANGUAGE SQL
BEGIN
DECLARE MIN_REORG_ORDER INTEGER;
DECLARE MAX_REORG_ORDER INTEGER;
DECLARE REORG_TAB_NAME VARCHAR(200);
DECLARE REORG_TAB_SCHEMA VARCHAR(200);
DECLARE REORG_PART_NAME VARCHAR(200);
DECLARE PART_CHECK INTEGER;

--Check the schema or table that need checking
CALL SYSPROC.REORGCHK_TB_STATS('S',IN_SCHEMA_TABLE);
--Insert the data into the table.
--Dont bother looking at the volitile tables e.g. STAGE, TEMP, PUMP tables
INSERT INTO DB_MAIN.AUTOMATED_REORG_RECORD(
    TABLE_SCHEMA,
    TABLE_NAME,
    TABLE_PARTITION,
    PARTITION_REASON,
    IDENTIFIED_TIMESTAMP
)
SELECT TABLE_SCHEMA,
    TABLE_NAME,
    DATAPARTITIONNAME,
    'S',
    CURRENT_TIMESTAMP
FROM SESSION.TB_STATS
WHERE REORG LIKE '%*%'
   AND TABLE_NAME NOT LIKE '%_STAGE%'
   AND TABLE_NAME NOT LIKE '%_PUMP%'
   AND TABLE_NAME NOT LIKE '%_TEMP%';

--Start processing the data 
--Get the minimum and maximum to loop through. 
SET MIN_REORG_ORDER = (SELECT MIN(REORG_ORDER) 
                       FROM DB_MAIN.AUTOMATED_REORG_VIEW 
                       WHERE TABLE_SCHEMA = IN_SCHEMA_TABLE
                            AND IDENTIFIED_TIMESTAMP BETWEEN CURRENT_TIMESTAMP - 12 HOURS AND CURRENT_TIMESTAMP);
SET MAX_REORG_ORDER = (SELECT MAX(REORG_ORDER)
                       FROM DB_MAIN.AUTOMATED_REORG_VIEW
                       WHERE TABLE_SCHEMA = IN_SCHEMA_TABLE
                            AND IDENTIFIED_TIMESTAMP BETWEEN CURRENT_TIMESTAMP - 12 HOURS AND CURRENT_TIMESTAMP);

WHILE (MIN_REORG_ORDER <= MAX_REORG_ORDER) DO
--Get the values to populate vars to run the reoorgs        
SET REORG_TAB_SCHEMA = (SELECT TABLE_SCHEMA
                        FROM DB_MAIN.AUTOMATED_REORG_VIEW
                        WHERE TABLE_SCHEMA = IN_SCHEMA_TABLE
                             AND IDENTIFIED_TIMESTAMP BETWEEN CURRENT_TIMESTAMP - 12 HOURS AND CURRENT_TIMESTAMP
                                    AND REORG_ORDER = MIN_REORG_ORDER);
SET REORG_TAB_NAME = (SELECT TABLE_NAME
                       FROM DB_MAIN.AUTOMATED_REORG_VIEW
                       WHERE TABLE_SCHEMA = IN_SCHEMA_TABLE
                             AND IDENTIFIED_TIMESTAMP BETWEEN CURRENT_TIMESTAMP - 12 HOURS AND CURRENT_TIMESTAMP
                                    AND REORG_ORDER = MIN_REORG_ORDER);
SET REORG_PART_NAME = (SELECT TABLE_PARTITION
                        FROM DB_MAIN.AUTOMATED_REORG_VIEW
                        WHERE TABLE_SCHEMA = IN_SCHEMA_TABLE
                             AND IDENTIFIED_TIMESTAMP BETWEEN CURRENT_TIMESTAMP - 12 HOURS AND CURRENT_TIMESTAMP
                                    AND REORG_ORDER = MIN_REORG_ORDER);
--Make sure that the table identified isnt a partitioned         
IF (LTRIM(RTRIM(REORG_PART_NAME)) = '') THEN            
--Check the table is not patitioned
            SET PART_CHECK = (SELECT COALESCE(SUM(CASE WHEN (DATAPARTITIONNAME IS NOT NULL) THEN 1 ELSE 0 END), 0)                                  FROM SYSIBM.SYSDATAPARTITIONS                                  WHERE DATAPARTITIONID <> 0                                    AND TABSCHEMA = REORG_TAB_SCHEMA                                    AND TABNAME = REORG_TAB_NAME);
            IF (PART_CHECK = 0) THEN
                     --Table is not partitioned - online reorg
                                            CALL DB_MAIN.AUTOMATED_REORG_TABLE_ONLINE(REORG_TAB_SCHEMA, REORG_TAB_NAME, 'R');
             END IF;
ELSE
            --Table is partitioned
                CALL DB_MAIN.AUTOMATED_REORG_TABLE_PARTITION_OFFLINE(REORG_TAB_SCHEMA, REORG_TAB_NAME,REORG_PART_NAME, 'R');        
END IF;
--Dont forget to increment            
            SET MIN_REORG_ORDER = MIN_REORG_ORDER + 1;
            SET REORG_TAB_SCHEMA = '';
            SET REORG_TAB_NAME = ''; 
           SET REORG_PART_NAME = '';
  END WHILE;
END
GO

You need to make sure that the table that you are trying to reorg is not partitioned hence the section counting the partitions, this is because partitioned tables will report one row as the table and schema and give no hint that is is partitioned. As I like to have stored procedures that are repeatable then the following will do the actual reorganisation. Theses are then reusable if you don’t supply the “record” flag. This is only the offline reorganisation for the partitioned tables but the code for the other stored procedure can be found in the file:

DROP PROCEDURE DB_MAIN.AUTOMATED_REORG_TABLE_PARTITION_OFFLINE(VARCHAR(255), VARCHAR(255), VARCHAR(255), CHAR(1))
GO
CREATE PROCEDURE DB_MAIN.AUTOMATED_REORG_TABLE_PARTITION_OFFLINE(IN IN_TABLE_SCHEMA VARCHAR(255), IN IN_TABLE_NAME VARCHAR(255), IN IN_PARTITION_NAME VARCHAR(255), IN IN_RECORD CHAR(1))
LANGUAGE SQL
BEGIN   
DECLARE REORG_STRING VARCHAR(1000);

SET REORG_STRING = 'REORG TABLE ' || LTRIM(RTRIM(IN_TABLE_SCHEMA)) || '.' || IN_TABLE_NAME || ' ALLOW NO ACCESS ON DATA PARTITION ' || IN_PARTITION_NAME;

IF (IN_RECORD = 'R') THEN         
   UPDATE DB_MAIN.AUTOMATED_REORG_RECORD        
   SET REORG_TYPE = 'OFFLINE',            
       REORG_COMMAND = REORG_STRING,            
       REORG_TIMESTAMP = CURRENT_TIMESTAMP        
   WHERE TABLE_SCHEMA = IN_TABLE_SCHEMA            
     AND TABLE_NAME = IN_TABLE_NAME            
     AND TABLE_PARTITION = IN_PARTITION_NAME            
     AND IDENTIFIED_TIMESTAMP BETWEEN CURRENT_TIMESTAMP - 12 HOURS AND CURRENT_TIMESTAMP;                
COMMIT;   
END IF;

CALL SYSPROC.ADMIN_CMD(REORG_STRING);
END
GO

One of the most important parts of this whole operation is the view and table where the auditing is stored and tables to process are identified. The reason that I use the identified time in the last 12 hours is because there should not really be any need to (hopefully) run this more than once a week, maybe your shop is different but at mine it will currently do.

DROP TABLE DB_MAIN.AUTOMATED_REORG_RECORD
GO
CREATE TABLE DB_MAIN.AUTOMATED_REORG_RECORD(    
TABLE_SCHEMA            VARCHAR(255) NOT NULL,
    TABLE_NAME              VARCHAR(255) NOT NULL,
    TABLE_PARTITION         VARCHAR(255) NOT NULL,
    PARTITION_REASON        CHAR(1) NOT NULL,
     IDENTIFIED_TIMESTAMP    TIMESTAMP NOT NULL,
    REORG_TYPE              VARCHAR(7),
    REORG_COMMAND           VARCHAR(1000),
    REORG_TIMESTAMP         TIMESTAMP,
	CONSTRAINT PK_DMRR_TABLESCHEMANAMEDATETIME PRIMARY KEY(TABLE_SCHEMA,TABLE_NAME,TABLE_PARTITION,IDENTIFIED_TIMESTAMP)
)IN DB_MAIN_TS
COMPRESS YES
GO
DROP VIEW DB_MAIN.AUTOMATED_REORG_VIEW
GO
CREATE VIEW DB_MAIN.AUTOMATED_REORG_VIEW AS (   
SELECT ROW_NUMBER() OVER (order by IDENTIFIED_TIMESTAMP, TABLE_PARTITION ASC) as REORG_ORDER,
           A.TABLE_SCHEMA,
           A.TABLE_NAME,
           A.TABLE_PARTITION,
           A.IDENTIFIED_TIMESTAMP
    FROM DB_MAIN.AUTOMATED_REORG_RECORD A    
)
GO

I hope that between the code snippets here and the code in the file supplied and if you want to get in contact or leave a comment please feel free too. I have had too load the file on here as a .doc as that is the only test file type that it would let me upload

DISCLAIMER: As stated at the top of the blog use this code in your production systems at your own peril. I have tested and know it works on my systems, please test and check it works on yours properly as reorganising tables can potentially dangerous.

FILE WITH CODE IN: DB2_Automated_Reorg_SP_V_T_DCP

The problem with VARGRAPHIC data type

So many of you may have come across this issue when you have to deal with foreign chars in the data that you hold. When I say foreign I mean the chars that are above decimal position 126 in any ASCII table.

The problem with these is that they are not a single byte, they are a double byte, when you want to store these in a DB2 database it can cause some issues; unfortunately I don’t know about the other major providers offerings but hopefully I can offer some insights into to what I have found so far. IBM offer two pieces of advice on coping with this

  1. Make the VARCHAR field twice the length that you expect it too be
  2. Use a VARGRAPHIC field

We were doing a migration, which is a whole different story and decided to change the fields that would be accepting user input values from batch files and other jobs data types from VARCHAR to VARGRAPHIC, as they potentially contain the most varied characters.

Below is some test code and results to show you the interesting differences here:

CREATE TABLE DEV.DATA_TYPE_TEST(
VARGRAPHIC_FIELD VARGRAPHIC(20),
VARCHAR_FIELD VARCHAR(20)
) IN DEV_TS
INSERT INTO DEV.DATA_TYPE_TEST
VALUES ('Mr Bloggs','Mr Bloggs'),('Mr Blôggs','Mr Blôggs')

Possibly not the best example but it will do for this purpose , when you issue the following query to get the sizes of the field, you can see that it will cause some issues down the line with left, right and substr functions

SELECT VARGRAPHIC_FIELD,
LENGTH(VARGRAPHIC_FIELD) LEN_VARCHAR_FIELD,
VARCHAR_FIELD,
LENGTH(VARCHAR_FIELD) LEN_VARCHAR_FIELD
FROM DEV.DATA_TYPE_TEST

RESULTS:

VARGRAPHIC_FIELD     LEN_VARCHAR_FIELD     VARCHAR_FIELD     LEN_VARCHAR_FIELD
-------------------  --------------------  ----------------  --------------------
Mr Bloggs            9                     Mr Bloggs         9
Mr Blôggs            9                     Mr Blôggs         10

So this leads to some issues if you want too only get part of the field:

SELECT VARGRAPHIC_FIELD,
LEFT(VARGRAPHIC_FIELD, 6) LEFT_6_VARGRAPHIC,
VARCHAR_FIELD,
LEFT(VARCHAR_FIELD, 6) LEFT_6_VARCHAR
FROM DEV.DATA_TYPE_TEST

RESULTS:

VARGRAPHIC_FIELD     LEFT_6_VARGRAPHIC     VARCHAR_FIELD     LEFT_6_VARCHAR
-------------------  --------------------  ----------------  -----------------
Mr Bloggs            Mr Blo                Mr Bloggs         Mr Blo
Mr Blôggs            Mr Blô                Mr Blôggs         Mr Bl

As you can see the function works fine on the VARGRAPHIC field, but not on the VARCHAR field. The function seems to be getting the first six characters by position, but the foreign characters in the VARCHAR field are stored as a double byte spread across two positions in the VARCHAR field, the whole char is not returned and so is not displayed . The VARGRAPHIC field handles this better and for each position the two byte characters are held in one position and the whole char is returned.

Other things to note that you can’t EXPORT from a VARCHAR field and LOAD into a VARGRAPIC field using an IXF file. DB2 will load all the fields that are of the same or compatible types, and null all the VARCHAR to VARGRAPIC conversion fields.

When you are loading a batch file using the LOAD command in DB2 if you try to load a data point into a varchar field, and the data point is twenty characters long and the field is twenty characters then you will also fit issues. As we have already seen the in a VARCHAR field the foreign characters will count for 2 characters, therefore if you load a string into that field that looks twenty characters long and it has a foreign character in it is actually twenty-one characters. This will obviously pop the field and not necessarily issue any warnings, the LOAD statement then seems to truncate it too twenty so you will be then missing a character from the data point.

Lazy RUNSTATS using SYSPROC.ADMIN_CMD

So if you follow my Twitter @dangerousDBA will know that I will do anything for an easy life, and where I work thee range of DB2 skills is very varied and so making things as simple as possible is always needed. To this end using the SYSPROC.ADMIN_CMD it is possible to make this as simple as possible without knowing all the ins and outs of of the actual command.

This first one then is just a simple runstats that will runstats on all indexes and columns.

CREATE PROCEDURE DB_MAIN.RUNSTATS  (IN IN_TABLESCHEMA VARCHAR(100), IN IN_TABLENAME VARCHAR(100))

LANGUAGE SQL

BEGIN

DECLARE REORGSTRING VARCHAR(255);

SET REORGSTRING = 'RUNSTATS ON TABLE ' || IN_TABLESCHEMA || '.' || IN_TABLENAME || ' ON ALL COLUMNS AND INDEXES ALL ALLOW WRITE ACCESS';

CALL SYSPROC.ADMIN_CMD(REORGSTRING);

END

As you can probably guess DB_MAIN is the schema that I keep all the stored procedures and tables in for maintaining the DB2 databases in. So this is easy for anyone who wants now do a total runstats on any table in the database. The second one that I created is a little more fine grained. This one runstats on all columns, but also only on an index specified so will run a little quicker.

CREATE PROCEDURE DB_MAIN.RUNSTATS_INDEX   (IN IN_TABLESCHEMA VARCHAR(100), IN IN_TABLENAME VARCHAR(100), IN IN_INDEX_NAME VARCHAR(255))
LANGUAGE SQL
BEGIN
DECLARE REORGSTRING VARCHAR(1000);
SET REORGSTRING = 'RUNSTATS ON TABLE ' || IN_TABLESCHEMA || '.' || IN_TABLENAME || ' ON ALL COLUMNS AND INDEXES ' || IN_TABLESCHEMA || '.' || IN_INDEX_NAME || ' ALLOW WRITE ACCESS';
CALL SYSPROC.ADMIN_CMD(REORGSTRING);
END

There is not a great need to run the statistics on the columns when you are after just the index, but when in Rome. Obviously you can change these to suit your needs and take out the column stats on the index SP.