Thursday, 2 April 2015

Capturing SQL statements and performance metrics for a single application database-connection in DB2

- David Sciaraffa, IBM IM/DB2 Accelerated Value Leader (Premium Support Manager).


A database administrator is often tasked to assess why an application is performing poorly. While this kind of assessment is a very broad topic (which I hope to cover in the future), understand that capturing detailed information for the statements executed by the application is obviously essential.  The ACTIVITY event monitor can be used to capture details on all statements executing on the database. More importantly, we can enable the activity event monitor data collection only for the application db-connections that we’re interested in. This helps minimize the impact of event monitor data collection on the rest of the database.

The data captured by the activity event monitor will contain all the statements executed by this application connection, various metrics for each statements (such as activity time, rows read, physical rows read, sort overflows, etc), input parameter marker values associated with each statement, and optimizer access-plan information for each statement.
This blog entry provides an example of how to enable an activity event monitor on a single application db-connection, how to display the statements captured for that db-connection, how to display the metrics for these statements, and how to generate the optimizer access-plan for these statements.


Unlike the legacy event monitors, when an ACTIVITY event monitor is created and enabled, it does not begin capturing information until it is told to do so via the WLM_SET_CONN_ENV() procedure (or the COLLECT ACTIVITY DATA clause on WLM objects). The WLM_SET_CONN_ENV() procedure takes an application-handle as it's first argument, and will enable the collection of activity information only for that application db-connection.

Example


[First, I generate some arbitrary background workload to capture]

Here I create 3 CLP sessions and generate some continuous random workload in the background:

terminal window 1:
$ echo 'db2 connect to DB97; while true; do db2 select \* from syscat.tables; done' | ksh

terminal window 2:
$echo 'db2 connect to DB97; while true; do db2 select \* from syscat.indexes; done' | ksh

terminal window 3:
$echo 'db2 connect to DB97; while true; do db2 select \* from syscat.procedures; done' | ksh
 

Create and enable an ACTIVITY event monitor:

Here, I create an activity event monitor to write to tables (and use the system generated default table names)

terminal window 4:
$ db2 connect to DB97
$ db2 create event monitor MYACTEVMON1 for activities write to table
$ db2 set event monitor MYACTEVMON1 state 1

The collection of activity data will not commence yet.

We can see the event monitor tables now exist:

$ db2 list tables for schema DSCIARAF | grep ACTIVITY

Table/View                      Schema          Type  Creation time
------------------------------- --------------- ----- --------------------------
ACTIVITYMETRICS_MYACTEVMON1     DSCIARAF        T     2015-04-02-18.11.57.821637
ACTIVITYSTMT_MYACTEVMON1        DSCIARAF        T     2015-04-02-18.11.56.848837
ACTIVITYVALS_MYACTEVMON1        DSCIARAF        T     2015-04-02-18.11.57.441909
ACTIVITY_MYACTEVMON1            DSCIARAF        T     2015-04-02-18.11.56.311935

 

Confirm that monitoring metrics are being collected for the database:

The MON_OBJ_METRICS db config parameter should be set to ‘BASE’ or ‘EXTENDED’.

$ db2 get db config for DB97 | grep MON_OBJ

Object metrics                        (MON_OBJ_METRICS) = BASE


I also like to collect the actual row counts returned through each operator of the optimizer access plan, by enabling the SECTION_ACTUALS db config parameter:

$
db2 update db config for DB97 using SECTION_ACTUALS BASE


Determine the application-handle for the db-connection that I’m interested in gathering information on:

I just arbitrarily choose one of my 3 CLP sessions for this example.  In a real world scenario I would use the application-handle associated with my under-performing application. 

$ db2 "select application_handle,application_id from sysibmadm.mon_connection_summary"

APPLICATION_HANDLE   APPLICATION_ID                                                                                              
-------------------- -----------------------------
                3216 *LOCAL.dsciaraf.150402215339
                3215 *LOCAL.dsciaraf.150402215301
                3366 *LOCAL.dsciaraf.150402230502                                                                                
                3214 *LOCAL.dsciaraf.150402215249
 

Turn on the collection of activity data for this specific application-handle:

$ db2 "CALL WLM_SET_CONN_ENV( 3215, '<collectactdata> WITH DETAILS, SECTION AND VALUES </collectactdata><collectactpartition> ALL </collectactpartition>')"

  Return Status = 0

(note that, the data collection will stop automatically if this application db-connection is severed or disconnected, even if the db-connection is later reestablished).

Wait for some duration of time to allow data to accumulate in my activity event monitor tables:

(remember that my 3 CLP sessions are still generating some random workload in the background).

$ sleep 60


Flush the event monitor buffers:

$ db2 flush event monitor MYACTEVMON1 buffer

List all of the statements/activities captured by the activity event monitor:

Here, I ordered the list by statement execution time from largest to smallest.
(notice that we will only see events from the application-id that we enabled collection on, not the other application connections):

$ db2 "select a.APPL_ID, a.ACTIVITY_ID, a.UOW_ID, a.ACT_EXEC_TIME, a.TIME_CREATED, SUBSTR (b.STMT_TEXT, 1, 20) as STMT_TEXT
from ACTIVITY_MYACTEVMON1 a, ACTIVITYSTMT_MYACTEVMON1 b
where a.APPL_ID=b.APPL_ID and a.ACTIVITY_ID=b.ACTIVITY_ID and a.UOW_ID=b.UOW_ID
order by ACT_EXEC_TIME desc" | less

APPL_ID                       ACTIVITY_ID          UOW_ID      ACT_EXEC_TIME        TIME_CREATED               STMT_TEXT
----------------------------- -------------------- ----------- -------------------- -------------------------- --------------------
*LOCAL.dsciaraf.150402215301                     1         936              1242842 2015-04-02-18.12.54.103126 select * from syscat…
*LOCAL.dsciaraf.150402215301                     1         957              1219326 2015-04-02-18.13.21.843027 select * from syscat…
*LOCAL.dsciaraf.150402215301                     1         948              1162989 2015-04-02-18.13.09.239075 select * from syscat…
*LOCAL.dsciaraf.150402215301                     1         958              1102184 2015-04-02-18.13.23.387440 select * from syscat…
…etc…

Isolate the statement(s) of concern, and display it’s metrics.

Now, let’s grab some useful metrics for the statement which had the highest execution time in our query above. Each statement is uniquely identified by it’s (APPL_ID + ACTIVITY_ID + UOW_ID).   In this case, the statement with the highest execution time has APPL_ID=’*LOCAL.dsciaraf.150402215301’, ACTIVITY_ID=’1’, and UOW_ID=’936’.

    • The ACTIVITY_<evmon name> table provides statement execution information.  A description of each column is seen in the event_activity logical data group KnowledgeCenter page.
    • The ACTIVITYMETRICS_<evmon name> table provides most of the performance metrics. A description of each column is seen in the event_activitymetrics logical data group KnowledgeCenter page.  Use this table in v97fixpack4 and later releases (prior to fp4 some of these metrics are available in the ACTIVITY_<evmon name> table above).
    • The ACTIVITYSTMT_<evmon_name> table provides the STMT_TEXT and some other similar metrics. A description of each column is seen in the event_activitystmt logical data group KnowledgeCenter page.
    • The ACTIVITYVALS_<evmon_name> table provides values that were input with the statement, such as parameter marker values.  If a statement execution did not have any indirect input values then a record will not exist within this table for the statement execution. A description of each column is seen in the event_activityvals logical data group KnowledgeCenter page.
      .

$ db2 “select LOCK_WAIT_TIME, LOCK_WAITS, LOG_BUFFER_WAIT_TIME, NUM_LOG_BUFFER_FULL, LOG_DISK_WAIT_TIME, LOG_DISK_WAITS_TOTAL, TOTAL_SECTION_SORTS, TOTAL_SECTION_SORT_TIME, TOTAL_ACT_TIME, POST_THRESHOLD_SORTS, SORT_OVERFLOWS, ROWS_READ, ROWS_MODIFIED, POOL_DATA_P_READS AS PHYSICAL_DAT_READS, POOL_INDEX_P_READS AS PHYSICAL_IDX_READS, DEADLOCKS, LOCK_TIMEOUTS
from ACTIVITYMETRICS_MYACTEVMON1
where APPL_ID='*LOCAL.dsciaraf.150402215301' and ACTIVITY_ID='1' and UOW_ID='936' "

LOCK_WAIT_TIME       LOCK_WAITS           LOG_BUFFER_WAIT_TIME …etc… ROWS_READ PHYSICAL_DAT_READS   …etc…  DEADLOCKS            LOCK_TIMEOUTS
-------------------- -------------------- -------------------- …etc… --------- -------------------- …etc… -------------------- --------------------
                   0                    0                    0             484                    0                          0                    0

To display the optimizer access-plan for a statement execution, use the EXPLAIN_FROM_ACTIVITY() procedure:

i) Create explain tables if you haven't already:

$ db2 "CALL SYSPROC.SYSINSTALLOBJECTS('EXPLAIN', 'C', CAST (NULL AS VARCHAR(128)), CAST (NULL AS VARCHAR(128)))"

ii) Populate the explain tables for the activity of interest:

In this example, I choose the same statement with the highest execution time, which has APPL_ID=’*LOCAL.dsciaraf.150402215301’, ACTIVITY_ID=’1’, and UOW_ID=’936’

$ db2 "CALL EXPLAIN_FROM_ACTIVITY ( '*LOCAL.dsciaraf.150402215301', 936, 1, 'MYACTEVMON1', 'SYSTOOLS',?, ?, ?, ?, ? )"

  Value of output parameters
  --------------------------
  Parameter Name  : EXPLAIN_SCHEMA
  Parameter Value : SYSTOOLS

  Parameter Name  : EXPLAIN_REQUESTER
  Parameter Value : DSCIARAF

  Parameter Name  : EXPLAIN_TIME
  Parameter Value : 2015-04-02-11.29.27.846973

  Parameter Name  : SOURCE_NAME
  Parameter Value : SQLC2H23

  Parameter Name  : SOURCE_SCHEMA
  Parameter Value : NULLID

  Parameter Name  : SOURCE_VERSION
  Parameter Value :

  Return Status = 0

iii) Generate the exfmt output for the most recently explained plan in the explain tables:

$ db2exfmt -1 -d db97

******************** EXPLAIN INSTANCE ********************

DB2_VERSION:       09.07.9
SOURCE_NAME:       SQLC2H23
SOURCE_SCHEMA:     NULLID
SOURCE_VERSION:
EXPLAIN_TIME:      2015-04-02-11.29.27.846973
EXPLAIN_REQUESTER: DSCIARAF

.... etc ...                       

                         Rows
                       Rows Actual
                         RETURN
                         (   1)
                          Cost
                          I/O
                           |
                          611
                          456
                       >^HSJOIN
                        (   2)
                        354.407
                          NA
         /----------------+-----------------\
      611                                     0
      456                                     0
    >^HSJOIN                                FETCH
     (   3)                                  (  12)
    341.624                                1.63472
     NA                                     NA

.... etc ...

Don’t forget to turn off the collection of activity data for this application-handle:

$ db2 "CALL WLM_SET_CONN_ENV( 3215, '<collectactdata>NONE</collectactdata>')"

  Return Status = 0


There is an excellent developerworks article here that describes a method to enable the activity event monitor collection for more then one  appl-handle, and help isolate the data for a specific appl-handle into individual temporary tables:
http://www.ibm.com/developerworks/data/library/techarticle/dm-1401sqltrace/


I hope this information is useful. If you have any thoughts or questions, feel free to submit a comment below. 

2 comments:

  1. How have things changed for DB2 10.1 & DB2 10.5. I know that there has been much work done on the product, so I imagine there are alternatives now. Can you expand on this

    ReplyDelete
  2. There were many monitoring enhancements in 10.1 and 10.5. You can skim those here (http://tinyurl.com/o7g2fff) for 10.1, and here (http://tinyurl.com/oylssg8) for 10.5. I'm excited that the MON_* functions have been expanded and can now completely replace all elements of the old intrusive GET SNAPSHOT commands (and their SNAP_* admin functions). However the MON_* functions mostly provide aggregated metrics and/or point-in-time information, which are useful for certain types of problem solving. As of 10.5, the Activity Event Monitor is still used to capture and retain information for each individual statement execution, as described above, if that's the level of detail you're looking for. Thx.

    ReplyDelete

Translate