- David Sciaraffa, IBM IM/DB2 Accelerated Value Leader (Premium Support Manager).
A TAKEOVER HADR operation, which instructs a Standby database to take over as the new Primary database, is normally a quick operation. However,
a TAKEOVER HADR operation (when not executed with the emergency WITH FORCE option) must first rollback any uncommitted transactions on the Primary database before the Standby database can take over. This can elongate the HADR TAKEOVER operation and it may appear to hang. This blog entry describes how to identify any large or old uncommitted transactions that would require a lengthy rollback.
When large or old uncommitted transactions exist, the rollback of these transactions can take a long time, causing the TAKEOVER HADR operation to appear to hang.
For large uncommitted transactions which consume a lot of LogSpace, the rollback must replay (undo) many log-records associated with the transaction.
For old uncommitted transactions which do not consume a lot of LogSpace, the rollback must still read and parse through many log-records to eventually find those associated with the transaction to replay (undo).
Determine the Size and Span of Uncommitted Transactions
To determine if there are any uncommitted transactions that will require a lengthy rollback, you can query db2pd-transactions to list transactions with the largest log span (in bytes). This represents the approximate number of bytes of log file data that will have to be parsed through in order to process the log-records required to rollback the transaction.
$ db2pd -db [dbname] -transactions > trans.out;
$ cat trans.out | awk '{diff = $10-$9; printf "%.0f=log_span, firstLSN=%s, lastLSN=%s, AppHndl=%s \n",diff,$9,$10,$2 }' | sort -n -r | less8648404033=log_span, firstLSN=0x000055D26CD7EAB6, lastLSN=0x000055D4705416F7, AppHndl=25943
1512356109=log_span, firstLSN=0x000055D4162F5C21, lastLSN=0x000055D47054152E, AppHndl=29146
488762756=log_span, firstLSN=0x000055D4533228EB, lastLSN=0x000055D47054166F, AppHndl=25629
55009676=log_span, firstLSN=0x000055D46D0CB109, lastLSN=0x000055D470541295, AppHndl=29424
53676373=log_span, firstLSN=0x000055D46D20AC09, lastLSN=0x000055D47053B55E, AppHndl=29436
51957579=log_span, firstLSN=0x000055D46D3B3180, lastLSN=0x000055D4705400CB, AppHndl=29465
2422301=log_span, firstLSN=0x000055D4702F0366, lastLSN=0x000055D47053F983, AppHndl=29442
152148=log_span, firstLSN=0x000055D47050A4D9, lastLSN=0x000055D47052F72D, AppHndl=30172
136276=log_span, firstLSN=0x000055D47050F536, lastLSN=0x000055D47053098A, AppHndl=30485
6765=log_span, firstLSN=0x000055D47053CC52, lastLSN=0x000055D47053E6BF, AppHndl=29666
6423=log_span, firstLSN=0x000055D4705305A2, lastLSN=0x000055D470531EB9, AppHndl=30774
2561=log_span, firstLSN=0x000055D47052C938, lastLSN=0x000055D47052D339, AppHndl=25326
1779=log_span, firstLSN=0x000055D47053CFC0, lastLSN=0x000055D47053D6B3, AppHndl=25378
1275=log_span, firstLSN=0x000055D470537F7E, lastLSN=0x000055D470538479, AppHndl=25367
526=log_span, firstLSN=0x000055D4705331A7, lastLSN=0x000055D4705333B5, AppHndl=29608
289=log_span, firstLSN=0x000055D470526016, lastLSN=0x000055D470526137, AppHndl=22962
…etc…
If we see one or more transactions with a log span of 500MB or more, we would generally expect it to take some *noticeable* amount of time for the transaction to rollback. But depending on the performance of your system, this threshold could be lower.
To find out more about these particular transactions, you can find their AppHndl's in "db2pd –db [dbname] -apinfo" output.
Estimate the Progress of a Rollback while TAKEOVER HADR is running
Once the TAKEOVER operation is issued, you can see which applications are still rolling back their uncommitted transections by using db2pd-applications:
$ db2pd –db [dbname] –applications | grep "RollbackActive"
Applications:
Address AppHandl [nod-index] NumAgents CoorEDUID Status … Appid …
0x07800000012F0080 9 [000-00009] 1 14394 RollbackActive … *LOCAL.dsciaraf.150703020130
As I mentioned above, you can determine the original LogSpace and log span values for these application’s transactions using the db2pd-transactions and awk command that I provided at top.
$ db2pd –db [dbname] –transactions > trans.out
$ cat trans.out | awk '{diff = $10-$9; printf "%.0f=log_span, firstLSN=%s, lastLSN=%s, AppHndl=%s \n",diff,$9,$10,$2 }' | sort -n -r | less30000269=log_span, firstLSN=0x00000192863F6B14, lastLSN=0x00000222863F883C, AppHndl=9
0=log_span, firstLSN=0x0000000000000000, lastLSN=0x0000000000000000, AppHndl=11
0=log_span, firstLSN=0x0000000000000000, lastLSN=0x0000000000000000, AppHndl=12
0=log_span, firstLSN=0x0000000000000000, lastLSN=0x0000000000000000, AppHndl=13
0=log_span, firstLSN=0x0000000000000000, lastLSN=0x0000000000000000, AppHndl=14
…
But how about monitoring the actual progress of the Rollback operation?
Unfortunately, since the Takeover operation has forced off all existing database connections and blocked new ones, we cannot use the normal methods for determining the progress of the rollback operations (such as monitoring the ‘Total Work’ and ‘Committed Work’ values of an application snapshot).
However, we can monitor the SpaceReserved value in db2pd-transactions.
(The SpaceReserved value represents the amount of log file space that was reserved for this transaction in case a rollback operation was needed. For every log-record that is replayed (undone) while a transaction is being rolled back, a special log-record is added directly into this reserved log file space. If the rollback were to fail and be restarted again, these special log-records allow DB2 to avoid processing log-records which have already been processed.)
As the rollback operation proceeds, the SpaceReserved value diminishes. This is easily observable for large transactions where many log-records are being replayed (undone). It is not easily observable for old transactions which are not large, since many log-records are parsed but seldom are replayed (undone) as I described earlier.
Here in my example, I simply watch the SpaceReserved value go down. You can extrapolate an estimated time to completion by watching the rate at which the value diminishes (however, since even large transactions can insert log-records into the log file stream in periodic clumps, you may see pauses in the rate at which this value diminishes while you are observing it).
$ db2pd –db [dbname] –transactions | grep ABORT
Address AppHandl TranHdl … State … Firstlsn Lastlsn SpaceReserved LogSpace …
0x070000004006CF00 7 2 … ABORT 0x00000192863F6B14 0x00000222863F883C 1151342 2312191 …
$ db2pd –db [dbname] –transactions | grep ABORTAddress AppHandl TranHdl … State … Firstlsn Lastlsn SpaceReserved LogSpace …
0x070000004006CF00 7 2 ABORT 0x00000192863F6B14 0x00000222863F883C 473890 2312191 …
$ db2pd –db [dbname] –transactions | grep ABORTAddress AppHandl TranHdl … State … Firstlsn Lastlsn SpaceReserved LogSpace …
0x070000004006CF00 7 2 ABORT 0x00000192863F6B14 0x00000222863F883C 304246 2312191 …
I hope this information is useful. If you have any thoughts or questions, feel free to submit a comment below.