Wednesday 1 April 2015

Understanding and Tuning DB2/LUW Backup and Restore Performance

- David Sciaraffa, IBM IM/DB2 Accelerated Value Leader (Premium Support Manager).

IT leaders will often mandate strict guidelines for the recovery time of a database.
Typically, if a database restore is required, it must complete within a certain time-frame (2hrs seems to be a common SLA for large databases).
The duration of database backup operations are also subject to scrutiny, since backups are resource intensive and running through peak workload time periods is not ideal.

This document provides methods to help understand and mitigate DB2 Backup and/or Restore performance issues.


 

Backup and Restore architecture - 50K foot view

The gist of the Backup utility is to read tablespace data and write it to one or more backup-image files, and the opposite for Restore.
The Backup and Restore utilities will spawn multiple agents/threads to do work.
For Backup, "buffer manipulator agents" (db2bm's) are used to read tablespace container data into empty memory buffers, and "media controller agents" (db2med's) are used to write these full memory buffers to the backup-image files (one db2med per target path/file).
For Restore, likewise, db2med's are used to read data from the backup-image files (one db2med per source path/file) into empty memory buffers, and db2bm's are used to write the contents of these full memory buffers to tablespace containers.

Parallelism of Tablespace Processing during Backup

For Backup, although multiple db2bm agents read from tablespace containers in parallel, there is no inter-tablespace parallelism -- that is to say, a tablespace is processed by a single db2bm agent. This means that, if a single (or few) tablespace dwarfs the others in size, for much of the Backup operation there will be a lack parallelism.
As a general best practice, data tables should be spread across many tablespaces, and tablespaces should have similar quantities of data.
 

Recognize the Backup and Restore Tuning Parameters

The performance of Backup and Restore operations are predominantly controlled by three parameters: WITH x BUFFERS; BUFFER SIZE y; PARALLELISM z; When these parameters are not explicitly specified in the Backup/Restore command invocation, DB2 will autonomically choose values based on available resources (I do not describe that algorithm here).

WITH x BUFFERS
Backup/Restore use a contiguous memory chunk or "buffer" for reading tablespace container data into.  This parameter controls how many of these buffers Backup/Restore will allocate from the Utility memory heap (UTIL_HEAP).

BUFFER SIZE y
The size of a buffer can be controlled with this parameter.

PARALLELISM z
This parameter controls how many db2bm agents will process tablespaces in parallel.
For Backup, each db2bm agent reads tablespace data into a buffer until the buffer is full. When the number of db2bm's is less then the total number of tablespaces, each db2bm will be assigned to a tablespace in round-robin fashion, from largest to smallest.
For Restore, each db2bm will process an arbitrary full buffer and write out the data to whichever tablespace that data belongs to.

[number of output paths]
A forth parameter is equally as important -- the number of output paths (backup-image files) to which the data is written to during Backup or read from during Restore. It is important to specify multiple paths so as to provide parallelism of data writes during Backup and reads during Restore, and best that these paths reside on different filesystems utilizing different disks or spindles, so as to avoid bottlenecks at the i/o subsystem level.

Tuning Preface

DB2 is fairly conservative when autonomically choosing values for WITH x BUFFERS, BUFFER SIZE y, and PARALLELISM z. It performs best when it has enough memory buffers to support good parallelism. By default, Backup and Restore will consume 50% of the available memory in the UTIL_HEAP for buffers. Thus, the simplest way to avoid Backup and Restore performance bottlenecks is to configure a good amount of UTIL_HEAP memory.  100,000 4k-pages is a good starting point for larger systems.

(Note that, if other utilities, such as Load, are consuming UTIL_HEAP memory when the Backup/Restore begin, the Backup/Restore will consume 50% of the available memory in the heap. In some situations this can be a problematically small amount of memory. If you are susceptible to this scenario, where Backup/Restore is run in parallel with other utilities like Load, you should either avoiding autonomic tuning and explicitly configuring these values, or assure that your UTIL_HEAP is large enough to accommodate the concurrent utilities… A UTIL_HEAP of 100,000 4k-pages is generally sufficient).

  • If you configure these values manually, a good starting point would be something like:

 WITH 16 BUFFERS; BUFFER SIZE 1024 4k-pgs; PARALLELISM 8. 
(this will consume 16 * (1024 pages * 4096 kb/page) = 64MB of UTIL_HEAP memory).

  • It is best to configure WITH x BUFFERS = 2 * PARALLELISM,  This allows for db2bm's to continue to read data into empty buffers, while db2med's are writing out the full buffers.
  • For Backup, you should specify at least 4 output paths (target backup-image files) and preferably more. Output paths should reside on different drives or spindles to avoid i/o bottlenecks or saturation.
  • Avoid the COMPRESS option if performance is a concern, since this causes serialization to a single db2bm. Instead run gzip on backup-image files upon completion of the backup.


 

Analyzing Backup and Restore Performance


Determine existing BUFFER, BUFFER SIZE, PARALLELISM values

To begin, let's determine the BUFFERS, BUFFER SIZE, and PARALLELISM values that our Backup or Restore are currently using.  These were either chosen by DB2 autonomically, or manually provided in the Backup/Restore command line parameters.

  $  grep "Autonomic BAR - using parallelism .*"  ~/sqllib/db2dump/db2diag.log
  $  grep "Using buffer size = .*, number = .*"  ~/sqllib/db2dump/db2diag.log

2013-06-25-01.31.22.453275-240 E321706A475        LEVEL: Info
PID     : 43712628             TID  : 47055       PROC : db2sysc 0
INSTANCE: db2inst8              NODE : 000         DB   : MYDB1
APPHDL  : 0-5674               APPID: *LOCAL.db2inst8.131203225353
AUTHID  : db2inst8
EDUID   : 47055                EDUNAME: db2agent (MYDB1) 0
FUNCTION: DB2 UDB, database utilities, sqluxGetDegreeParallelism, probe:762
DATA #1 : <preformatted>
Autonomic BAR - using parallelism = 16.

2013-06-25-01.31.49.823112-240 E322696A501        LEVEL: Info
PID     : 43712628             TID  : 47055       PROC : db2sysc 0
INSTANCE: db2inst8              NODE : 000         DB   : MYDB1
APPHDL  : 0-5674               APPID: *LOCAL.db2inst8.131203225353
AUTHID  : db2inst8
EDUID   : 47055                EDUNAME: db2agent (MYDB1) 0
FUNCTION: DB2 UDB, database utilities, sqlubTuneBuffers, probe:1127
DATA #1 : <preformatted>
Autonomic backup - tuning enabled.
Using buffer size = 4097, number = 16.

In this arbitrary example, we have the equivalent of WITH 16 BUFFERS; PARALLELISM 16; BUFERSIZE 4097 4k-pages.

 

Collect Performance Data

Next, we want to collect some performance data during our Backup or Restore operation.

1) We enable the collection of Backup and Restore statistics using the DB2_BAR_STATS registry variable (this registry variable exists in v97 and later, and is enabled by default in 10.5 and later).

  $ db2set DB2_BAR_STATS=ON  (which requires an instance recycle to take effect).

When this registry variable is enabled, some additional diagnostic information is printed into the db2diag.log upon completion of the Backup or Restore.
You can find a description of the columns in this developerworks article: https://www.ibm.com/developerworks/community/blogs/DB2LUWAvailability/entry/backup_and_restore_statistics?lang=en

2) We can also enable a db2trc-perfcount using a small function mask: (this should have no impact on online database performance)

$ db2trc on -perfcount -t -m "*.*.*.sqluWriteToFileDevice,sqlbReadBlocks,sqluMCReadFromDevice,sqlbWriteBlocks,sqluReadBufferFromQueue.*"
$ [perform the Backup or Restore operation]
$ db2trc dump trc_perfcount.raw
$ db2trc off
$ db2trc perffmt trc_perfcount.raw trc_perfcount.txt

 

Interpreting the Performance Data

1) Interpreting the DB2_BAR_STATS output.

The db2diag.log entries that are generated at the end of a Backup or Restore operation will look similar to this:

BM#    Total      I/O      MsgQ      WaitQ      Buffers   MBytes
---  --------  --------  --------  --------    --------  --------
000  102560.89  75436.75   1560.07      1.18      563800   2238113
001  102560.87     40.66     32.84  77268.41        9934     39691
…etc…
---  --------  --------  --------  --------    --------  --------
TOT  410240.59  103228.63   3021.26  202635.50      835859   3318764

MC#    Total      I/O      MsgQ      WaitQ      Buffers   MBytes
---  --------  --------  --------  --------    --------  --------
000  102560.87  25840.68  76706.92      0.03      139498    557988

..etc…
---  --------  --------  --------  --------    --------  --------
TOT  615365.37  155954.13  459269.57     14.38      835865   3343432

The columns can be interpreted as follows:

BM# - The db2bm agent identifier.
MC# - The db2med agent identifier.
Total - The total time that the agent existed.
I/O - The total time that was spent performing read or write I/O.
MsgQ - The total time that was spent waiting for a buffer (db2bm's must wait for an empty buffer from the db2med's. db2med's must wait for an full buffer from db2bm's).
WaitQ - The total time spent waiting for the backup to complete after this agent has already completed processing the tablespace(s) assigned to it.
Buffers - The total number of buffers that were processed.
KBytes - The total quantity of data that was processed.

 

Examples

Backup example 1)

BM#    Total      I/O      MsgQ      WaitQ      Buffers   MBytes
---  --------   --------  --------  --------    --------  --------
000  102560.89  75436.75   1560.07      1.18      563800   2238113
001  102560.87     40.66     32.84  77268.41        9934     39691
002  102559.21  27682.77   1390.38  48126.34      252089   1000863
003  102559.61     68.43     37.96  77239.57       10036     40095
---  --------   --------  --------  --------    --------  --------
TOT  410240.59 103228.63   3021.26 202635.50      835859   3318764

MC#    Total      I/O      MsgQ      WaitQ      Buffers   MBytes
---  --------  --------  --------  --------    --------  --------
000  102560.87  25840.68  76706.92      0.03      139498    557988
001  102560.87  25834.30  76694.83      2.88      139535    558136
002  102560.87  25810.99  76721.00      2.88      139636    558540
003  102560.86  25805.54  76726.42      2.87      139552    558204
004  102560.86  25961.13  76576.80      2.85      139326    557300
005  102561.00  26701.46  75843.57      2.85      138318    553264
---  --------   --------  --------  --------    --------  --------
TOT  615365.37 155954.13 459269.57     14.38      835865   3343432

For this Backup example,

  • There are 4 db2bm's (BM#0 through #3) telling us that PARALLELISM was 4. Each db2bm will process a single tablespace at a time, round-robin'ed.
  • The (‘MBytes’) amount of data read by each db2bm varies. BM#001 processed 2.3TB of data, BM#003 processed 1TB of data, BM#01 and #03 processed only a few hundred MB each.
  • The (‘WaitQ’) amount of time spent waiting for backup to complete, tells us that only BM#001 was busy for the entire backup, BM#002 was busy for about half the backup, BM#01 and #03 were mostly idle during the Backup.
  • The (‘MsgQ’) amount of time spent waiting for empty buffers, tells us that some db2buf’s spend a moderate amount of time waiting for buffers.

    So, we can infer that there are 2 large tablespaces, one is about 2.2TB and the other is about 1TB. Backup parallelism is hindered/serialized by one (or two) large tablespaces here. Thus, the tablespace architecture is not favourable here, and spreading this data across many more multiple tablespaces would be good advice here.
  • There are 6 db2med's (MC#0 through #5) telling us that there are 6 output paths (target backup-image files).
  • The (‘MBytes’) amount of data  written by each db2med is almost equal, telling us that all 6 paths' filesystems perform identically.
  • The (‘WaitQ’) amount of time spent waiting for backup to complete, tells us that none of these agents were never idle.
  • The (‘MsgQ’) amount of time spent waiting for full buffers, tells us that all of these agents spent a lot of time waiting for full buffers.

    This tells us that db2med's are spending a lot of time waiting for free buffers to become available. Thus increasing the number of buffers (using 2 * PARALLELISM as an example) would be good advice here.

Backup example 2)

BM# Total    I/O      MsgQ     WaitQ     Buffers MBytes
--- -------- -------- -------- -------- -------- --------
000 4456.52     98.35    72.11 3658.53       308     4885
001 4302.85    560.83   176.51 2490.53       975    15576
002 4449.72   1265.34   172.12    0.11      3840    61437
003 4303.09   1178.74   160.89  161.56      3520    56296
004 4455.09     96.28    95.33 3415.24       363     5802
005 4456.12    363.35   208.02 2948.69       952    15226
006 4453.22    273.06   271.64 2973.20      1268    20272
007 4309.83     96.22    75.87 3512.09       257     4086
008 4454.61     82.29    91.68 3512.77       299     4765
009 4309.76    640.41   217.45 1709.97      1988    31792
010 4305.58    428.91   162.33 2920.27      1029    16441
011 4408.53     82.17    90.57 3513.42       303     4831
012 4453.37    382.53   132.49 3010.86       924    14769
013 4455.86    345.96   238.34 2932.40      1024    16373
014 4302.71    599.58   197.12 2319.56      1615    25831
015 4309.80    453.87   210.27 2722.29      1250    19987
016 4439.46    262.37   148.39 3121.92       789    12617
017 4292.10     95.01    84.52 3477.18       283     4505
018 4456.33    377.35   174.31 2967.36       870    13904
019 4453.45    310.12   137.97 3082.02       787    12578
020 4453.43    149.33    70.36 3329.20       440     7031
021 4455.54    114.68   103.57 3338.44       595     9517
022 4433.39    463.78   197.05 2769.80      1233    19714
023 4420.68    481.17   173.53 2793.67      1201    19201
024 4293.09    434.39   199.04 2849.30      1107    17699
025 4456.51    111.46    82.97 3397.69       413     6590
026 4292.49    422.35   178.91 2910.77      1060    16946
027 4444.76    252.99   195.57 3085.01       807    12902
028 4292.14     92.18    77.26 3517.93       317     5047
029 4308.19     85.53    82.71 3517.84       285     4535
--- -------- -------- -------- -------- -------- --------
TOT 131678.39 10600.73 4479.04 85959.76    30102   481172

MC#    Total      I/O     MsgQ    WaitQ  Buffers   MBytes
--- -------- -------- -------- -------- -------- --------
000  4456.52  1043.60 3412.68      0.00     7483   119741
001  4456.61  1065.77 3389.94      0.64     7698   123166
002  4456.52  1028.95 3426.67      0.64     7349   117596
003  4456.52  1051.67 3403.95      0.64     7576   121229
--- -------- -------- -------- -------- -------- --------
TOT 17826.18  4190.00 13633.26     1.92    30106   481733

We can see that:
-- There are 30 db2bm agents, telling us that PARALLELISM was 30.
-- The (‘MsgQ’) amount of time that agents spent waiting for empty buffers is negligible compared to the total time.
-- The (‘WaitQ’) amount of time that agents spent idle, waiting for the completion of the backup (ie. waiting for other db2lbm's to complete processing their tablespaces) is significant for all except two db2lbm’s.
-- the (‘Buffers’) number of buffers processed, and (‘MBytes’) also confirms that two db2lbms have processed much more data then all others. Thus there are two tablespaces that are significantly larger then all others. We would benefit from splitting these two large tablespaces into multiple smaller tablespaces if desired.


-- There are 4 db2med agents, telling us that there are 4 output paths.
-- The (‘I/O’) amount of time spent in read/write i/o is about the same, the output path filesystems all perform similarly.
-- The (‘MsgQ’) amount of time spent waiting for full buffers, and thus the (‘Total’) total amount of time consumed, is very large, dwarfing I/O times. This implies that it takes db2buf’s a long time to fill buffers and send them to db2meds.  Since the db2buf’s do not spend any time waiting for empty buffers, we infer that the db2buf’s are being throttled (since utility throttling was enabled in this scenario), and thus taking time to fill the empty buffers.
We might benefit from increasing the UTIL_IMPACT_LIM to reduce the amount of throttling, if desired.

Restore example 1)

BM#    Total      I/O      MsgQ      WaitQ      Buffers   MBytes
---  --------  --------  --------  --------    --------  --------
000  110803.40  10539.96   1793.52    122.90        4636    296704
001  110663.95  10580.47   1807.65  92089.62        4649    297504
002  110663.99  10577.14   1808.32  92089.62        4649    297536
003  110664.28  10582.63   1802.41  92089.62        4648    297440
004  110663.95  10566.64   1819.02  92089.62        4647    297408
005  110664.00  10587.56   1780.89  92089.62        4656    297952
006  110663.95  10571.84   1807.67  92089.62        4651    297664
007  110663.97  10602.11   1782.06  92089.62        4651    297664
008  110664.12  10607.18   1788.34  92089.61        4651    297600
009  110663.96  10584.53   1790.27  92089.61        4661    298304
010  110663.98  10545.68   1769.15  92089.61        4598    294272
011  110665.10  10532.63   1790.81  92089.61        4707    301216
012  110663.94  10618.23   1790.84  92089.61        4630    296320
013  110663.94  10592.89   1771.88  92089.61        4602    294496
014  110785.75  10541.60   1924.17  92089.61        4625    295936
015  110663.96  10563.22   1813.53  92089.61        4653    297792
---  --------  --------  --------  --------    --------  --------
TOT  1770886.31  169194.41  28840.60  1381467.20       74314   4755808

MC#    Total      I/O      MsgQ      WaitQ      Buffers   MBytes
---  --------  --------  --------  --------    --------  --------
000  110488.50  16426.53  94060.66      1.11        6194    396320
001  92089.62       0.00      0.00  92089.61           0         0
002  110467.64  16393.52   1983.89  92090.03        6192    396192
003  110476.87  16343.40   2043.22  92090.06        6197    396480
004  110489.64  16313.07   2086.41  92089.95        6193    396224
005  110472.90  16311.18   2071.61  92089.92        6195    396352
006  110497.97  16349.75   2058.01  92090.02        6196    396416
007  92089.62       0.00      0.00  92089.61           0         0
008  110473.11  16371.50   2011.39  92090.03        6193    396224
009  92089.61       0.00      0.00  92089.61           0         0
010  110477.80  16346.43   2041.14  92090.04        6195    396384
011  110408.97  16327.19   1991.63  92089.96        6192    396192
012  92089.61       0.00      0.00  92089.61           0         0
013  110487.59  16316.95   2080.42  92090.03        6194    396320
014  110489.82  16343.48   2056.09  92090.04        6194    396320
015  110663.92  16587.85   1985.82  92090.05        6195    396384
---  --------  --------  --------  --------    --------  --------
TOT  1694253.26  196430.90  116470.34  1381349.77       74330   4755808

For this Restore example,

  • There are 16 db2bm's (BM#0 through #15) telling us that PARALLELISM was 16.
  • The amount of data 'MBytes' written by each db2bm is almost identical, this is usually the case with Restore, since a db2bm can write to any tablespace.
  • The time spent in 'WaitQ' tells us that only BM#000 was busy for the entire Restore, while BM#001 through #15 were idle for a large portion of time. This is because this example was a Redirected Restore, and tablespace container file pre-allocation by db2bm #000 took some time (this particular operating system did not support fast pre-allocation), leaving other db2bm idle during this time. 
  • The time spent waiting for a free buffer 'MsgQ' is relatively small.
    _____________________________________________________
  • There are 16 db2med's (MC#0 through #15) telling us that there were 16 paths (to source backup-image files) specified in the Restore command.
  • The amount of data 'MBytes' read by each db2med varies. MC#1, #7, #9, and #12 read no data.  This is because in this example 16 paths were specified in the Restore command, however only 12 source backup-image files actually existed, so 4 db2meds were just idle. For db2med's that did read data, the amount data read was almost identical, telling us that the 12 filesystems perform identically.
  • The time spent in 'WaitQ' tells us that only MC#000 was busy for the entire Restore, while MC#001 through #15 were idle for a large portion of time. Again, this is because in this example of a Redirected Restore, container pre-allocation of files took a long time.
  • The time spent waiting for a free buffer 'MsgQ' is relatively small for each db2med, and 0 for those db2med's that were idle due to the lack of backup-image source files. (The time spent in MsgQ is large for BM#000 only because of the filesystem preallocation necessitated by the Redirected Restore.).

 

2) Interpreting the db2trc-perfcount output

  • The first column in the output is the count of executions of the function.
  • The second column is the accumulated time spent in the function.
  • The third column is the function name.  
  • Just divide the accumulated time by the count of executions and you have an average execution time for the function.

for a Backup operation:

# cat  trc_perfcount.txt

7913    (43 sec, 537763882 nanosec)      sqlbReadBlocks
27      (2 sec, 625676050 nanosec)       sqluWriteToFileDevice

The sqlbReadBlocks entry represents the amount of time spent in i/o reading data from tbspace containers.
Each i/o operation is the size of a tablespace page.
In this example, 43.5 secs / 7913 = 0.0055 secs per read i/o. 

The sqluWriteToFileDevice entry represents the amount of time spent in i/o writing data to backup-image files.
Each i/o operation is the same size as the Backup BUFFER SIZE.
In this example, 2.62 secs / 27 = 0.097 secs per write i/o)

In this example, the i/o response times seem reasonable to me.

for a Restore operation:

# cat  trc_perfcount.txt

26      (0 sec, 206157421 nanosec)       sqluMCReadFromDevice
7445    (125 sec, 970007753 nanosec)     sqlbWriteBlocks

The sqluMCReadFromDevice entry represents the amount of time spent in i/o reading data from backup-image files.
Each i/o operation is the same size as the Restore BUFFER SIZE.
In this example, 0.21secs / 26 = 0.035 secs per read i/o)

The sqlbWriteBlocks entry represents the amount of time spent in i/o writing data to tbspace containers.
Each i/o operation is the size of a tablespace page.
In this example, 125.9 / 7445 = 0.017 secs per write i/o)

In this example, the i/o response times seem reasonable to me.

 

I hope this information is useful to some. Please feel free to post a comment below.

1 comment:

  1. This article is really wonderful. Very well explained and definitely helps to improve the tuning of backup and restore .

    ReplyDelete

Translate