- David Sciaraffa, Software Engineering Manager – IBM Db2
A Db2 database can be easily configured for high availability by maintaining a real-time synchronized peer database using a feature called High Availability Disaster Recovery, or more commonly known as ‘HADR’. An HADR standby database stays synchronized with a primary database by replaying log-record data reflective of all changes made on the primary database. HADR is quite simple to configure, and is virtually seamless to applications. HADR is an extremely popular Db2 feature, and is well documented within the Db2 Knowledge Center, as well as the HADR Developerworks Wiki.
Reads on Standby feature
In Db2 Version 9.7, IBM introduced an HADR feature called “Reads on Standby”, which allows for read-only SQL queries to read data from the HADR standby database, as depicted in the following diagram:
Of the several restrictions present in the Reads on Standby feature, one restriction in-particular made it difficult for many users to broadly adopt, specifically called the replay-only window.
Replay-only Window restriction (prior to Version 11.1.4.4)
When an HADR standby database is replaying log-record data, if it encounters a log-record for a maintenance or Data Definition Language (DDL) operation (such as most CREATE, ALTER, or DROP statements, REORG, RUNSTATS, and other such operations), the standby database triggers a condition called the "replay-only window", where all executing queries on the standby database are interrupted, all database connections are terminated and new connections to the standby are blocked (receiving an SQL1224N error code). Only after the replay of the DDL or maintenance operation has completed, are new connections allowed on the standby database again.
The types of DDL statements and maintenance operations that cause a replay-only window are well documented, and also listed below for reference:
The following diagram depicts the process-flow and effect of the replay-only window:
When the replay-only window is triggered, diagnostic messages appear in the db2diag.log, like so:
2017-11-22-15.48.36.321657-300 I522432E731 LEVEL: Info
PID : 25476 TID : 140255646705408 KTID : 26687< PROC : db2sysc INSTANCE: db2inst1 NODE : 000 DB : TESTDB APPHDL : 0-8 APPID: *LOCAL.DB2.171122204107 HOSTNAME: myhost1 EDUID : 87 EDUNAME: db2redom (TESTDB) FUNCTION: DB2 UDB, recovery manager, SQLP_REPLAY_ONLY_WINDOW_STAT::sqlpStartHadrReplayOnlyWindow, probe:9140 MESSAGE : Replay only window is triggered by this log record: LogStreamId / TID< / LSO / action
and:
2017-11-22-15.49.28.915844-300 E546560E554 LEVEL: Warning PID : 25476 TID : 140255646705408 KTID : 26687< PROC : db2sysc INSTANCE: db2inst1 NODE : 000 DB : TESTDB APPHDL : 0-8 APPID: *LOCAL.DB2.171122204107 HOSTNAME: myhost1 EDUID : 87 EDUNAME: db2redom (TESTDB) FUNCTION: DB2 UDB, base sys utilities, sqeLocalDatabase::HdrEndRep layOnlyWindow, probe:210 DATA #1 : String, 73 bytes Replay only window is inactive, connections to Active Standby are allowed
Additional information on how to monitor the replay-only window is described in sections ‘Diagnostic log messages during replay-only window’ and ‘Monitoring the replay-only window’ in Knowledge Center topic Replay-only window (and replay-only window avoidance) on the active standby database.
The replay-only window can be a show-stopper to the usefulness of the Reads on Standby feature, when applications on the primary database perform DDL operations often, since the frequency of replay-only window on the standby database increases, and thus queries on the standby are interrupted often. The create and drop of temporary tables appear to be a very common DDL operation inherent in many workloads.
Replay-only Window avoidance (starting in v11.1.4.4)
Starting in Db2 Version 11.1 Mod Pack 4 Fix Pack 4 (v11.1.4.4), a significant improvement to the replay-only window is now available, by enabling the DB2_HADR_ROS_AVOID_REPLAY_ONLY_WINDOW registry variable.
When the registry variable is set to ON, the standby database does not enter the replay-only window. Instead, only existing application connections on the standby database which hold locks on tables, indexes or other objects that conflict with the DDL or maintenance operation to be replayed will be forced off. Other connections, including new connections, are permitted to continue, as depicted in the following diagram:
With this improvement, a create or drop of a temporary table on the primary database, which are the most common DDL operations in day-to-day workloads, do not cause any interruption to application connections on the standby database. Furthermore, the impact from a DDL or maintenance operation performed on a regular table on the primary database is limited only to application connections on the standby database which hold or require a lock on that specific table at the time that the operation is being replayed on the standby.
Note, application connections on the standby database may experience a brief lock-wait condition if the table or object they are trying to access currently has a DDL or maintenance log-record replayed on it. For information on how to determine if an application on the standby database is in a prolonged lock-wait behind a DDL log-record replay operation, see section ‘Monitoring lock conflicts when replay-only window avoidance is enabled’ in Knowledge Center topic Replay-only window (and replay-only window avoidance) on the active standby database.
Additional note, even when replay-only window avoidance is enabled, the following few DDL operations will still cause a full replay-only window on the standby database:
I hope this information is useful. If you have any thoughts or questions, feel free to submit a comment below.