- David Sciaraffa, Software Development Manager – IBM Db2
In Db2 systems configured for high availability disaster recovery (HADR), all data changes which are made on the primary database are also transmitted and replayed on one or more secondary ‘standby’ databases. If a system failure occurs on the primary database server, applications can be rerouted to the standby database almost instantaneously (and usually done so through automation).
Db2 HADR is simple to configure, and mostly invisible to the applications accessing the database. There are many excellent resources online for getting started with HADR:
Step by Step Procedure to set up HADR replication between DB2 databases
IBM Developerworks Db2 HADR Wiki (Tutorial, Getting-Started, Performance Monitoring and Tuning, more…)
Performing SQL queries on a standby database (“Reads-on-Standby”) --
It is possible to allow applications to connect to the Db2 standby database and perform SQL read-only queries. We refer to standby databases which are enabled for this as a “Reads-on-Standby” database.
Reads-on-Standby databases are simple to enable. There are numerous resources available to help get started:
Get the most from the DB2 HADR standby database: Using the reads on standby capability
HADR reads on standby feature
Reads on standby restrictions
Limitation of HADR Reads-on-Standby databases – the “replay-only-window”
A current limitation of Reads-on-Standby databases is the “replay-only-window”. When an application on the primary database performs a ‘DDL’ operation (such as creating or altering a regular or temporary table, view, or other database object) all application connections on the standby database will be momentary forced off (any currently executing queries will be interrupted). After the unit-of-work containing the DDL operation is fully replayed on the Standby database, application connections and queries on the standby database are permitted again. The list of DDL and utility operations which trigger the replay-only-window are documented in Replay-only window on the active standby database.
What does a replay-only-window look like?
First, let’s establish a connection on the standby database, and start running some basic read-only queries:
[standby db server]
$ db2pd -db hadr105 -hadr | grep HADR_ROLE
HADR_ROLE = STANDBY
$ db2 connect to hadr105
Database Connection Information
Database server = DB2/LINUXX8664 10.5.9
SQL authorization ID = DB2INST1
Local database alias = HADR105
$ while [[ 1 ]]
do
db2 -v "select FIRSTNME from EMPLOYEE with UR";
done
Let’s invoke a simple DDL statement on the primary database, to trigger a replay-only-window on the standby database:
[primary db server]
$ db2pd -db hadr105 -hadr | grep HADR_ROLE
HADR_ROLE = PRIMARY
$ db2 create table table1 '(c1 int)'
DB20000I The SQL command completed successfully.
When the replay-only-window is triggered on the standby database, all running queries on the standby database will be interrupted and their associated database connections will be forced off resulting in an SQL1224N error:
[standby db server]
$ db2 "select FIRSTNME from EMPLOYEE"
SQL1224N The database manager is not able to accept new requests, has
terminated all requests in progress, or has terminated the specified request
because of an error or a forced interrupt. SQLSTATE=55032
Also, if an attempt is made to establish a connection to the standby database while a replay-only-window is active, an SQL1776N reason 4 is returned:
[standby db server]
$ db2 connect to hadr1113
SQL1776N The command cannot be issued on an HADR database. Reason code = "4".
How can I monitor the replay-only-window?
In all Db2 versions:
The STANDBY_REPLAY_ONLY_WINDOW_ACTIVE value of ‘db2pd –hadr’ or the MON_GET_HADR() table function will have a value of ‘Y’ on the standby database for the duration of the replay-only-window:
[standby db server]
$ db2pd –db hadr105 –hadr
HADR_ROLE = STANDBY
REPLAY_TYPE = PHYSICAL
HADR_SYNCMODE = NEARSYNC
STANDBY_ID = 0
LOG_STREAM_ID = 0
HADR_STATE = PEER
HADR_FLAGS = TCP_PROTOCOL
PRIMARY_MEMBER_HOST = hotellnx108
PRIMARY_INSTANCE = db2inst1
PRIMARY_MEMBER = 0
STANDBY_MEMBER_HOST = hotellnx111
STANDBY_INSTANCE = db2inst1
STANDBY_MEMBER = 0
HADR_CONNECT_STATUS = CONNECTED
HADR_CONNECT_STATUS_TIME = 04/28/2018 18:55:38.089002 (1524956138)
HEARTBEAT_INTERVAL(seconds) = 30
HEARTBEAT_MISSED = 0
HEARTBEAT_EXPECTED = 283
HADR_TIMEOUT(seconds) = 120
TIME_SINCE_LAST_RECV(seconds) = 0
PEER_WAIT_LIMIT(seconds) = 0
LOG_HADR_WAIT_CUR(seconds) = 0.000
LOG_HADR_WAIT_RECENT_AVG(seconds) = 0.000308
LOG_HADR_WAIT_ACCUMULATED(seconds) = 1.909
LOG_HADR_WAIT_COUNT = 986
SOCK_SEND_BUF_REQUESTED,ACTUAL(bytes) = 0, 16384
SOCK_RECV_BUF_REQUESTED,ACTUAL(bytes) = 0, 87380
PRIMARY_LOG_FILE,PAGE,POS = S0000002.LOG, 725, 55945845
STANDBY_LOG_FILE,PAGE,POS = S0000002.LOG, 725, 55945845
HADR_LOG_GAP(bytes) = 0
STANDBY_REPLAY_LOG_FILE,PAGE,POS = S0000002.LOG, 725, 55945789
STANDBY_RECV_REPLAY_GAP(bytes) = 2
PRIMARY_LOG_TIME = 04/28/2018 21:17:06.000000 (1524964626)
STANDBY_LOG_TIME = 04/28/2018 21:17:06.000000 (1524964626)
STANDBY_REPLAY_LOG_TIME = 04/28/2018 21:17:06.000000 (1524964626)
STANDBY_RECV_BUF_SIZE(pages) = 512
STANDBY_RECV_BUF_PERCENT = 0
STANDBY_SPOOL_LIMIT(pages) = 13000
STANDBY_SPOOL_PERCENT = 0
STANDBY_ERROR_TIME = NULL
PEER_WINDOW(seconds) = 0
READS_ON_STANDBY_ENABLED = Y
STANDBY_REPLAY_ONLY_WINDOW_ACTIVE = Y
STANDBY_REPLAY_ONLY_WINDOW_START = 04/28/2018 21:17:32.000000 (1524964652)
STANDBY_REPLAY_ONLY_WINDOW_TRAN_COUNT = 1
In Db2 version 11.1.2.2 and previous releases:
The db2 diagnostic log file (db2diag.log) on the standby database will contain the following messages:
[standby db server]
$ less /home/db2inst1/sqllib/db2dump/db2diag.log
2018-04-28-18.36.18.723755-240 E41445875E502 LEVEL: Warning
PID : 483 TID : 140327683876608 PROC : db2sysc
INSTANCE: dsciaraf NODE : 000 DB : HADR105
APPHDL : 0-12 APPID: *LOCAL.DB2.180428222707
HOSTNAME: hotellnx111
EDUID : 685 EDUNAME: db2redom (HADR105)
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::HdrForceAppsInReplayOnlyWindow
, probe:100
DATA #1 : String, 28 bytes
Replay only window is active
2018-04-28-18.36.18.724354-240 I41446378E490 LEVEL: Info
PID : 483 TID : 140327683876608 PROC : db2sysc
INSTANCE: db2inst1 NODE : 000 DB : HADR105
APPHDL : 0-12 APPID: *LOCAL.DB2.180428222707
HOSTNAME: hotellnx111
EDUID : 685 EDUNAME: db2redom (HADR105)
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::HdrKickPooledAgents, probe:10
DATA #1 : <preformatted>
Interrupting agent with eduid:688
2018-04-28-18.36.18.724607-240 I41446869E500 LEVEL: Info
PID : 483 TID : 140327683876608 PROC : db2sysc
INSTANCE: db2inst1 NODE : 000 DB : HADR105
APPHDL : 0-12 APPID: *LOCAL.DB2.180428222707
HOSTNAME: hotellnx111
EDUID : 685 EDUNAME: db2redom (HADR105)
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::HdrKickPooledAgents, probe:20
DATA #1 : <preformatted>
Waiting for last disassociation from the db
When the replay-only-windows is completed, and applications allowed to reconnect to the standby database, the following db2diag.log message is displayed:
2018-04-28-18.36.18.973225-240 E41447370E539 LEVEL: Warning
PID : 483 TID : 140327683876608 PROC : db2sysc
INSTANCE: db2inst1 NODE : 000 DB : HADR105
APPHDL : 0-12 APPID: *LOCAL.DB2.180428222707
HOSTNAME: hotellnx111
EDUID : 685 EDUNAME: db2redom (HADR105)
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::HdrEndReplayOnlyWindow, probe:
210
DATA #1 : String, 73 bytes
Replay only window is inactive, connections to Active Standby are allowed
In addition to these db2diag.log entries, it is possible to parse db2 recovery log files to determine which log-records (if any) will cause a replay-only-window, by using the db2fmtlog_replayonlywindow tool, which can be downloaded from the Db2 HADR wiki here: (link). Since db2 recovery log files are mostly identical on both primary and standby databases, this tool can be run on log files resided on either database (or the log archive path):
[primary or standby db server]
$ db2 get db config for hadr105 | grep "Path to log files"
Path to log files = /home/db2inst1/db2inst1/NODE0000/SQL00001/LOGSTREAM0000/
$ cd /home/db2inst1/db2inst1/NODE0000/SQL00001/LOGSTREAM0000/
$ db2fmtlog_replayonlywindow_linux 0-2
|------|------------------------------------------------------------------------
| LREC | 5171 000408D1 00000000029D
|------|------------------------------------------------------------------------
| LREC | Record LSO = 45190368
| | Record TID = 00000000029D
| | Action = DDL
|------|------------------------------------------------------------------------
| LREC | 5180 00040921 00000000037E
|------|------------------------------------------------------------------------
| LREC | Record LSO = 45197733
| | Record TID = 00000000037E
| | Action = DDL
Also, when the LOG_DDL_STMTS database configuration parameter is set to yes on the Primary database (at the time the DDL operation was performed and it’s log records were generated within the recovery log file), then the DDL statement itself will also be displayed:
|------|------------------------------------------------------------------------
| LREC | 5171 000408D1 00000000029D
|------|------------------------------------------------------------------------
| LREC | Record LSO = 45190368
| | Record TID = 00000000029D
| | Action = DDL
|------|------------------------------------------------------------------------
| LREC | 5171 000408D2 00000000029D
|------|------------------------------------------------------------------------
| LREC | Record LSO = 45190796
| | Record TID = 00000000029D
| | DDL Statement = create table t3 (c1 int)
|------|------------------------------------------------------------------------
| LREC | 5180 00040921 00000000037E
|------|------------------------------------------------------------------------
| LREC | Record LSO = 45197733
| | Record TID = 00000000037E
| | Action = DDL
|------|------------------------------------------------------------------------
| LREC | 5180 00040922 00000000037E
|------|------------------------------------------------------------------------
| LREC | Record LSO = 45198161
| | Record TID = 00000000037E
| | DDL Statement = create table t4 (c1 int)
In Db2 version 11.1.3.3 and newer releases:
The db2 diagnostic log file (db2diag.log) on the standby database has been enhanced to provide additional details when a replay-only-window occurs. The frequency of these diagnostic messages can be configured using the DB2_HADR_REPLAY_ONLY_WINDOW_DIAGLEVEL registry variable. A value of 0 meaning no additional diagnostic messages, a value of 1 meaning only the first log-record/DDL statement within the unit-of-work will be displayed, and a value of 2 means all log-records/DDL statements within the unit-of-work will be displayed.
[standby db server]
$ db2pd -db hadr1113 | grep HADR_ROLE
HADR_ROLE = STANDBY
$ less /home/db2inst1/sqllib/db2dump/db2diag.log2018-04-28-19.31.28.182828-240 I41576483E762 LEVEL: Info
PID : 8721 TID : 140697919284992 PROC : db2sysc
INSTANCE: db2inst1 NODE : 000 DB : HADR1113
APPHDL : 0-11 APPID: *LOCAL.DB2.180428225527
HOSTNAME: hotellnx111
EDUID : 397 EDUNAME: db2redom (HADR1113)
FUNCTION: DB2 UDB, recovery manager, SQLP_REPLAY_ONLY_WINDOW_STAT::sqlpStartHadrReplayOnlyWindow,
probe:9140
MESSAGE : Replay only window is triggered by this log record: LogStreamId / TID
/ LSO / action
DATA #1 : db2LogStreamIDType, PD_TYPE_DB2_LOG_STREAM_ID, 2 bytes
0
DATA #2 : SQLP_TID, PD_TYPE_SQLP_TID, 6 bytes
00000000029D
DATA #3 : unsigned integer, 8 bytes
45190368
DATA #4 : String, 3 bytes
DDL
2018-04-28-19.31.28.183183-240 E41577246E504 LEVEL: Warning
PID : 8721 TID : 140697919284992 PROC : db2sysc
INSTANCE: dsciaraf NODE : 000 DB : HADR1113
APPHDL : 0-11 APPID: *LOCAL.DB2.180428225527
HOSTNAME: hotellnx111
EDUID : 397 EDUNAME: db2redom (HADR1113)
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::HdrForceAppsInReplayOnlyWindow, probe:100
DATA #1 : String, 28 bytes
Replay only window is active
2018-04-28-19.31.28.183344-240 I41577751E502 LEVEL: Info
PID : 8721 TID : 140697919284992 PROC : db2sysc
INSTANCE: dsciaraf NODE : 000 DB : HADR1113
APPHDL : 0-11 APPID: *LOCAL.DB2.180428225527
HOSTNAME: hotellnx111
EDUID : 397 EDUNAME: db2redom (HADR1113)
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::HdrKickPooledAgents, probe:20
DATA #1 : <preformatted>
Waiting for last disassociation from the db
And same as previous releases, when the replay-only-windows is completed, and applications allowed to reconnect to the standby database, the following db2diag.log message is displayed:
2018-04-28-19.31.28.375434-240 E41579289E541 LEVEL: Warning
PID : 8721 TID : 140697919284992 PROC : db2sysc
INSTANCE: dsciaraf NODE : 000 DB : HADR1113
APPHDL : 0-11 APPID: *LOCAL.DB2.180428225527
HOSTNAME: hotellnx111
EDUID : 397 EDUNAME: db2redom (HADR1113)
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::HdrEndReplayOnlyWindow, probe:210
DATA #1 : String, 73 bytes
Replay only window is inactive, connections to Active Standby are allowed
Also, when the LOG_DDL_STMTS database configuration parameter is set to yes on the Primary database (at the time the DDL operation was performed and it’s log records were generated within the recovery log file and transmitted to the standby database), then the DDL statement will also be printed into the db2diag.log starting in v11.1.3.3 and new releases:
2018-04-28-19.31.28.183524-240 I41578254E544 LEVEL: Info
PID : 8721 TID : 140697919284992 PROC : db2sysc
INSTANCE: dsciaraf NODE : 000 DB : HADR1113
APPHDL : 0-11 APPID: *LOCAL.DB2.180428225527
HOSTNAME: hotellnx111
EDUID : 397 EDUNAME: db2redom (HADR1113)
FUNCTION: DB2 UDB, recovery manager, SQLP_REPLAY_ONLY_WINDOW_STAT::sqlpSetDDLStmtForHadrReplayOnly
Window, probe:9150
MESSAGE : DDL statement text
DATA #1 : String, 24 bytes
create table table3 (c1 int)
In addition, available in v11.1.3.3 and newer releases, the db2fmtlog tool is available in the sqllib/bin/ folder, and it’s REPLAYONLYWINDOW option can be used to display all log records (if any) that will cause a replay-only-window. If the LOG_DDL_STMTS database configuration parameter is set to yes on the Primary database (at the time the DDL operation was performed and it’s log records were generated), then the statement itself is displayed as well:
[primary or standby db server]
$ db2 get db config for hadr105 | grep "Path to log files"
Path to log files = /home/db2inst1/db2inst1/NODE0000/SQL00001/LOGSTREAM0000/
$ cd /home/db2inst1/db2inst1/NODE0000/SQL00001/LOGSTREAM0000/
$ ~/sqllib/bin/db2fmtlog -replayonlywindow 0-10
|------|------------------------------------------------------------------------
| LREC | Record LSO = 45190796
| | Record TID = 00000000029D
| | DDL Statement = create table t3 (c1 int)
|------|------------------------------------------------------------------------
| LREC | 5180 00040921 00000000037E
|------|------------------------------------------------------------------------
| LREC | Record LSO = 45197733
| | Record TID = 00000000037E
| | Action = DDL
|------|------------------------------------------------------------------------
| LREC | 5180 00040922 00000000037E
|------|------------------------------------------------------------------------
| LREC | Record LSO = 45198161
| | Record TID = 00000000037E
| | DDL Statement = create table t4 (c1 int)
How can the replay-only-window be avoided?
In Db2 version 11.1.3.3 and prior releases, the database administrator should implement monitoring for replay-only-window’s. Should the replay-only-windows occur frequently enough to hinder application productivity (or increase in frequency over time), then the DDL statements causing replay-only-windows should be determined (as described above), and applications teams should be engaged to determine if these DDL statements can be reduced in frequency.
Application’s which perform queries against the standby database should plan to expect SQL1224N errors on occasion (due to the replay-only-window), and implement handling to automatically re-connect to the database and re-submit and interrupted query.
Stayed tuned for a drastic improvement which eliminates the replay-only-window in a future Db2 version.
I hope this information is useful. If you have any thoughts or questions, feel free to submit a comment below.