Wednesday, 18 November 2020

Generating Reports from Db2 historical monitoring (db2histmon) data

 - David Sciaraffa, Software Engineering Manager – IBM Db2

(This blog is a follow-up to my previous blog which describes the Db2 Historical Monitoring (db2histmon) framework.)

The Db2 Historical Monitoring (db2histmon) scripts (available here) are a successor to the older Db2 Persistent Diagnostic Data Collection scripts, and similarly collect a broad range of Db2 and Operating System diagnostic data, and retain this data for a period of time, allowing for basic triage of many types of issues. Diagnostic information is collected at various intervals (ranging from 1min to 1hour, depending on the kind of data), and can be easily customized and expanded. The scripts will collect data only while the database is activated, which can help to reduce the generation of unnecessary redundant data.

This blog provides an overview of the new report generation script.

The new report generation script can be used to scrape previously generated db2histmon data collections and produce summary reports in multiple flavours, similar to the reports produced by the MONREPORT module. The reports are useful for for analyzing various metrics, and highlighting statistical outliers, from previous time periods.

Downloading the db2histmon report script:

The full set of db2histmon scripts can be downloaded from the open-source Github repo:
The script itself is located in the 4_report/ sub-folder:

(db2inst1@myhost1) /home/db2inst1/db2histmon-master/4_report/
$ ls


  • The scripts can be deployed in any Unix or Windows environment where Db2 is supported.
  • Python2 or higher.
  • The python pandas, argparse, and numpy modules are also required to execute the script.
Using pip to install the pandas, argparse, and numpy modules:
(db2inst1@myhost1) /home/db2inst1/
  $ pip install argparse
Collecting argparse
  Downloading argparse-1.4.0-py2.py3-none-any.whl (23 kB)
Installing collected packages: argparse
Successfully installed argparse-1.4.0
(db2inst1@myhost1) /home/db2inst1/
  $ pip install numpy
Collecting numpy
  Downloading numpy-1.19.4-cp38-cp38-win_amd64.whl (13.0 MB)
     || 13.0 MB 656 kB/s
Installing collected packages: numpy
Successfully installed numpy-1.19.4
(db2inst1@myhost1) /home/db2inst1/
  $ pip install pandas
Collecting pandas
  Downloading pandas-1.1.4-cp38-cp38-win_amd64.whl (8.9 MB)
     || 8.9 MB 1.1 MB/s
Collecting python-dateutil>=2.7.3
  Downloading python_dateutil-2.8.1-py2.py3-none-any.whl (227 kB)
     || 227 kB 1.7 MB/s
Installing collected packages: python-dateutil, pandas
Successfully installed pandas-1.1.4 python-dateutil-2.8.1

Types of reports:

The script is currently capable of generating three unique types of reports:

DBSUMMARY - Provides a summary of metrics for the entire database, similar to the MONREPORT.DBSUMMARY module.

CONNECTION - Provides a summary of metrics for each connection, similar to the MONREPORT.CONNECTION module.

PKGCACHE - Provides a summary of metrics for the top executed statements that are still cached within the package-cache, similar to the MONREPORT.PKGCACHE module.

A report will display various metrics and data points for every data collection time period (or limited only to the time periods specified in the arguments), as described later.

Historical data which is used to generate the report:

The script will scrape historical monitoring data that was previously collected by the Db2 Historical Monitoring (db2histmon) framework. 

Please see the 'The Data Collection and Data Archive folders' section of my previous blog which describes the Db2 Historical Monitoring (db2histmon) framework, if you're unsure where this historic monitoring data may reside.

For reference, I'm including a screen-shot showing the contents of the db2histmon data collection folder:


Usage and options:

The script has the following usage options:

(db2inst1@myhost1) /home/db2inst1/db2histmon-master/4_report/ 
   $ python -h
usage: [-h]  [-r {dbsummary,connection,pkgcache}]
                 [-st START_TIME] [-et END_TIME] [-p PERIOD]
                 [-s] [-m MEMBERS [MEMBERS ...]] 
                 [-ah APPLICATION_HANDLES [APPLICATION_HANDLES ...]] path

positional arguments:
  path                  path where collection directories are stored

optional arguments:
  -h, --help            show this help message and exit
  -r, --report {dbsummary,connection,pkgcache}, 
                        name of report to be generated
  -st START_TIME, --start_time START_TIME
                        display collections after this time, format can be (YYYY-mm-dd HH:MM:SS) or copied from a
                        collection (YYYY-mm-dd-HH-MM-SS.ffffff)
  -et END_TIME, --end_time END_TIME
                        display collections before this time, format can be (YYYY-mm-dd HH:MM:SS) or copied from a
                        collection (YYYY-mm-dd-HH-MM-SS.ffffff)
  -p PERIOD, --period PERIOD
                        display collections at a lesser frequency, for example an interval of 3 will show every third
  -s, --stats           show min, max, mean, std for each series of values
  -m MEMBERS [MEMBERS ...], --members MEMBERS [MEMBERS ...]
                        filter certain reports by member
                        filter certain reports by application handle

Use the --report option to specify the type of report to generate, either DBSUMMARY, CONNECTION, or PKGCACHE.

The last argument specifies the folder path where the db2histmon historical monitoring data resides.

I'll explain the other options within the examples further below.

Example 1. Generate a DBSUMMARY report (on all available historical monitoring data)

(db2inst1@myhost1) /home/db2inst1/db2histmon-master/4_report/
$ python -r DBSUMMARY /path/to/db2histmon/data_files/

Example 2. Generate a report on a subset of time-intervals, by using the START_TIME and END_TIME options

In some cases, the quantity of db2histmon historical monitoring data collections available or archived is very large and may make the reports too large or unreadable.

We can use the --START_TIME and --END_TIME options to limit the data collection time-intervals that we wish to generate a report on.  
In this example, we wish to include all data collections that were generated after 2020-09-29 02:20 but before 2020-09-29 04:30:

(db2inst1@myhost1) /home/db2inst1/db2histmon-master/4_report/
$ python -r DBSUMMARY --start_time "2020-09-29 02:20:00" --end_time "2020-09-29 04:30:00" /path/to/db2histmon/data_files/
Add caption

Example 3. Generate a report with time-intervals condensed into larger time PERIODs.

We can also use the --PERIOD option to combine time-intervals from the data collection set into larger sets, such that fewer (but longer duration) time-intervals are displayed in the report.
In this example, many of our original db2histmon historical monitoring data collections are approximately 5minutes apart (as an be seen in the previous report's example above). By specifying the --PERIOD 3 option, we aggregate the data which would normally be reported in three separate time-periods into one time-period: 

(db2inst1@myhost1) /home/db2inst1/db2histmon-master/4_report/
$ python -r DBSUMMARY --period 3  /path/to/db2histmon/data_files/

Example 4. Generate a report showing MIN, MAX, MEAN statistical values.

Understanding the statistical values associated with a particular metric can be useful when looking for anomalies in the data set.  In this example, I specify the --stats option to include the statistics for each reported value:

(db2inst1@myhost1) /home/db2inst1/db2histmon-master/4_report/
$ python -r DBSUMMARY --period 3  /path/to/db2histmon/data_files/

No comments:

Post a comment