Saturday, 8 December 2018

Improving Read Availability of the HADR Standby database… Avoiding the Replay-only Window

- David Sciaraffa, Software Engineering Manager – IBM Db2


A Db2 database can be easily configured for high availability by maintaining a real-time synchronized peer database using a feature called High Availability Disaster Recovery, or more commonly known as ‘HADR’.  An HADR standby database stays synchronized with a primary database by replaying log-record data reflective of all changes made on the primary database. HADR is quite simple to configure, and is virtually seamless to applications.  HADR is an extremely popular Db2 feature, and is well documented within the Db2 Knowledge Center, as well as the HADR Developerworks Wiki.


Reads on Standby feature

In Db2 Version 9.7, IBM introduced an HADR feature called “Reads on Standby”, which allows for read-only SQL queries to read data from the HADR standby database, as depicted in the following diagram:

image


Of the several restrictions present in the Reads on Standby feature, one restriction in-particular made it difficult for many users to broadly adopt, specifically called the replay-only window



Replay-only Window restriction (prior to Version 11.1.4.4)

When an HADR standby database is replaying log-record data, if it encounters a log-record for a maintenance or Data Definition Language (DDL) operation (such as most CREATE, ALTER, or DROP statements, REORG, RUNSTATS, and other such operations), the standby database triggers a condition called the "replay-only window", where all executing queries on the standby database are interrupted, all database connections are terminated and new connections to the standby are blocked (receiving an SQL1224N error code).   Only after the replay of the DDL or maintenance operation has completed, are new connections allowed on the standby database again. 

The types of DDL statements and maintenance operations that cause a replay-only window are well documented, and also listed below for reference:

 image  image


The following diagram depicts the process-flow and effect of the replay-only window:

image


When the replay-only window is triggered, diagnostic messages appear in the db2diag.log, like so:

2017-11-22-15.48.36.321657-300 I522432E731           LEVEL: Info
PID : 25476 TID : 140255646705408 KTID : 26687< PROC : db2sysc INSTANCE: db2inst1 NODE : 000 DB : TESTDB APPHDL : 0-8 APPID: *LOCAL.DB2.171122204107 HOSTNAME: myhost1 EDUID : 87 EDUNAME: db2redom (TESTDB) FUNCTION: DB2 UDB, recovery manager, SQLP_REPLAY_ONLY_WINDOW_STAT::sqlpStartHadrReplayOnlyWindow, probe:9140 MESSAGE : Replay only window is triggered by this log record: LogStreamId / TID< / LSO / action
and:
2017-11-22-15.49.28.915844-300 E546560E554           LEVEL: Warning
PID     : 25476                TID : 140255646705408 KTID : 26687<
PROC    : db2sysc
INSTANCE: db2inst1             NODE : 000            DB   : TESTDB
APPHDL  : 0-8                  APPID: *LOCAL.DB2.171122204107
HOSTNAME: myhost1
EDUID   : 87                   EDUNAME: db2redom (TESTDB)
FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::HdrEndRep
layOnlyWindow, probe:210
DATA #1 : String, 73 bytes
Replay only window is inactive, connections to Active Standby are allowed

Additional information on how to monitor the replay-only window is described in sections ‘Diagnostic log messages during replay-only window’ and ‘Monitoring the replay-only window’ in Knowledge Center topic Replay-only window (and replay-only window avoidance) on the active standby database.

The replay-only window can be a show-stopper to the usefulness of the Reads on Standby feature, when applications on the primary database perform DDL operations often, since the frequency of replay-only window on the standby database increases, and thus queries on the standby are interrupted often.  The create and drop of temporary tables appear to be a very common DDL operation inherent in many workloads.


Replay-only Window avoidance (starting in v11.1.4.4)

Starting in Db2 Version 11.1 Mod Pack 4 Fix Pack 4 (v11.1.4.4), a significant improvement to the replay-only window is now available, by enabling the DB2_HADR_ROS_AVOID_REPLAY_ONLY_WINDOW registry variable.

When the registry variable is set to ON, the standby database does not enter the replay-only window. Instead, only existing application connections on the standby database which hold locks on tables, indexes or other objects that conflict with the DDL or maintenance operation to be replayed will be forced off. Other connections, including new connections, are permitted to continue, as depicted in the following diagram:


image


With this improvement, a create or drop of a temporary table on the primary database, which are the most common DDL operations in day-to-day workloads, do not cause any interruption to application connections on the standby database.  Furthermore, the impact from a DDL or maintenance operation performed on a regular table on the primary database is limited only to application connections on the standby database which hold or require a lock on that specific table at the time that the operation is being replayed on the standby.

Note, application connections on the standby database may experience a brief lock-wait condition if the table or object they are trying to access currently has a DDL or maintenance log-record replayed on it. For information on how to determine if an application on the standby database is in a prolonged lock-wait behind a DDL log-record replay operation, see section ‘Monitoring lock conflicts when replay-only window avoidance is enabled’ in Knowledge Center topic Replay-only window (and replay-only window avoidance) on the active standby database.


Additional note,  even when replay-only window avoidance is enabled, the following few DDL operations will still cause a full replay-only window on the standby database:

image



I hope this information is useful. If you have any thoughts or questions, feel free to submit a comment below. 

Translate