ThinkingDb2!: March 2019

Sunday, 24 March 2019

Db2 LOAD command – Estimating TCP/IP Ephemeral Ports in a Database Partitioned Environment

- David Sciaraffa, Software Engineering Manager – IBM Db2

The Db2 LOAD command is a commonly used operation for bulk inserting very large quantities of data into new or existing tables. The operation is multi-process and highly parallelized, and benefits from minimal logging. The LOAD operation was designed to assume significant CPU and memory resources to achieve maximum throughput. Tuning of LOAD resources is traditionally done by configuring it’s CPU_PARALLELISM, DATA_BUFFER, DISK_PARALLELISM, and ANYORDER modifiers (see Options for improving load performance).

Load Process Model in DPF Environments
In Partitioned Database (DPF) environments, LOAD operations further benefit from parallelized partitioning of data to separate database partitions. This is well described in Loading data in a partitioned database environment.

Consider the following diagram, which illustrates some of the key processes of a Load operation in a DPF environment. In this example, imagine we have a 20 database partition configuration, with db-partition #0 used solely as Catalog partition, and tables defined in a database partition group which spans database partitions #1 through #20.

Load Coordinator Agent: The Coordinator Agent spawns all other agents required to perform the Load on all database partitions, and monitors their status. The Coordinator Agent will reside on the database partition where the application is connected. This is also where the source data files reside (except when the CLIENT option is used to specify a remote client-side file). If the input source if of type CURSOR instead of a data file, then the Coordinator Agent opens the cursor and fetches the records from it, passing those records to the Partitioning Agent(s) for processing).

Pre-Partitioning Agent(s):
The Load Pre-Partitioning Agent opens the input source data file (or pipe, or user supplied file transfer command) and sends buffers containing complete records to each Partitioning Agent via a tcpip socket/port in a round-robin fashion. This Agent always runs on the coordinator database partition where the source data file resides. There is one Pre-Partitioning Agent per source data file specified on the Load command, when the ANYORDER modifier is specified (default). A Pre-Partitioning Agent is not created when the input source is of type CURSOR.

Partitioning Agent(s):
The Partitioning Agent receives data from each Pre-Partitioning Agent, extracts the partitioning columns from each data record, determines the target database partition where these records should reside, and sends the record to each Media Reader running on the database partition via tcpip sockets/ports. By default, there is one Partitioning Agent per each database partition where the database partition group for the target table is defined. The number also configurable using the PARTITIONING_DBPARTNUMS load option, and the PARTITIONING_DBPARTNUMS option. The maximum number of Partitioning Agents is currently 25 (which may change in a future release).

Load Agent(s):
The Load Agent spawns and monitors all of the Load sub-agents which constitute the Load operation on a single database partition. These sub-agents are described briefly in Introduction of db2 load edus in single partition database. (The Load Agent corresponds to the ‘db2agent’ in that document). A description of these Load sub-agents is beyond the scope and intention of this blog. By default, there is one Load Agent per each database partition where the database partition group for the target table is defined. The number is also configuration using the OUTPUT_DBPARTNUMS option.

Media Reader EDU(s):
The Media Reader EDU receives data from each Partitioning Sub Agent, and feeds this data into the Load sub-agents on the database partition for insertion into the table on the database partition. There is one Media Reader EDU per Load Agent.

Number of TCP/IP Ports used
Consider the following diagram, which illustrates the TCIP/IP sockets/ports used to transmit data between Load Pre-Partitioning Agents, Partitioning Agents, and Media Reader EDUs in a DPF environment.

The number of TCP/IP sockets/ports required are:

Let A equal the number of Pre-Partitioning Agents.
(Each Pre-Partitioning Agent will require a TCP/IP connection with each Partitioning Agent).
Let B equal the number of Partitioning Agents.
(Each Partitioning Agent will require a TCP/IP connection with each Pre-Partitioning Agent, and also require a TCP/IP connection with each Media Reader EDU).
Let C equal the number Media Reader EDUs
(Each Media Reader EDU will require a TCP/IP connection with each Partitioning Agent).

The total number of TCP/IP sockets/ports required by LOAD across all database partitions is:
Total = (B + C) + 2 x ((A x B) + (B x C))

In this example:
- We have a 20 database partition configuration, with db-partition #0 used solely as Catalog partition, and tables defined in a database partition group which spans database partitions #1 through #20.
- The Load operation is invoked on database partition #0 – thus the Coordinator Agent for this Load operation resides on this database partition.
- Two raw input data files were specified in the Load, thus we have two Pre-Partitioning Agents residing on this database partition.
- Each of the two Pre-Partitioning Agents are sending data records to 19 Partitioning Agents (on db-partitions #1 through #20).
- Each of the 19 Partitioning Agents are partitioning and sending data to the 19 Media Reader EDUs (on db-partitions #1 through #20).
- Each of the 19 Media Reader EDUs is processing data and inserting it into the database on its database-partition.

From the Example above, we have:
A (Pre-Partitioning Agents) = 2
B (Partitioning Agents) = 19
C (Media Readers) = 19
Thus, the total number of tcp/ip ports across all database partitions is (B + C) + 2 x ((A x B) + (B x C)) = (19 + 19) + 2 x ((2 x 19) + (19 x 19)) = 437.

(Note, you can use the information above to infer the number of TCP/IP ports required on a specific database-partition).

Configuring port ranges

The PORT_RANGE option of Load allows for the specification of the range of ports that Load should use on each database partition, as described in Configuration options for partitioned database environments.

At the operating system level, the range and quantity of ephemeral tcp/ip ports can be configure on AIX using the TCP_EPHEMERAL_LOW setting. on Linux using the ip_local_port_range configuration. How To check and change the TCP Anon Port range in solaris,linux,hp-ux,aix.

I hope this information is useful. If you have any thoughts or questions, feel free to submit a comment below.