Sunday 24 March 2019

Db2 LOAD command – Estimating TCP/IP Ephemeral Ports in a Database Partitioned Environment

- David Sciaraffa, Software Engineering Manager – IBM Db2



The Db2 LOAD command is a commonly used operation for bulk inserting very large quantities of data into new or existing tables. The operation is multi-process and highly parallelized, and benefits from minimal logging.  The LOAD operation was designed to assume significant CPU and memory resources to achieve maximum throughput.  Tuning of LOAD resources is traditionally done by configuring it’s CPU_PARALLELISM, DATA_BUFFER, DISK_PARALLELISM, and ANYORDER modifiers (see Options for improving load performance).

Load Process Model in DPF Environments
In Partitioned Database (DPF) environments, LOAD operations further benefit from parallelized partitioning of data to separate database partitions. This is well described in Loading data in a partitioned database environment.

Consider the following diagram, which illustrates some of the key processes of a Load operation in a DPF environment. In this example, imagine we have a 20 database partition configuration, with db-partition #0 used solely as Catalog partition, and tables defined in a database partition group which spans database partitions #1 through #20. 





Load Coordinator Agent:  The Coordinator Agent spawns all other agents required to perform the Load on all database partitions, and monitors their status. The Coordinator Agent will reside on the database partition where the application is connected.  This is also where the source data files reside (except when the  CLIENT option is used to specify a remote client-side file).  If the input source if of type CURSOR instead of a data file, then the Coordinator Agent opens the cursor and fetches the records from it, passing those records to the Partitioning Agent(s) for processing).

Pre-Partitioning Agent(s):
The Load Pre-Partitioning Agent opens the input source data file (or pipe, or user supplied file transfer command) and sends buffers containing complete records to each Partitioning Agent via a tcpip socket/port in a round-robin fashion. This Agent always runs on the coordinator database partition where the source data file resides.  There is one Pre-Partitioning Agent per source data file specified on the Load command, when the ANYORDER modifier is specified (default).  A Pre-Partitioning Agent is not created when the input source is of type CURSOR.

Partitioning Agent(s):
The Partitioning Agent receives data from each Pre-Partitioning Agent, extracts the partitioning columns from each data record, determines the target database partition where these records should reside, and sends the record to each Media Reader running on the database partition via  tcpip sockets/ports. By default, there is one Partitioning Agent per each database partition where the database partition group for the target table is defined. The number also configurable using the PARTITIONING_DBPARTNUMS load option, and the PARTITIONING_DBPARTNUMS option. The maximum number of Partitioning Agents is currently 25 (which may change in a future release).

Load Agent(s):
The Load Agent spawns and monitors all of the Load sub-agents which constitute the Load operation on a single database partition. These sub-agents are described briefly in Introduction of db2 load edus in single partition database. (The Load Agent corresponds to the ‘db2agent’ in that document).  A description of these Load sub-agents is beyond the scope and intention of this blog.  By default, there is one Load Agent per each database partition where the database partition group for the target table is defined.  The number is also configuration using the OUTPUT_DBPARTNUMS option.

Media Reader EDU(s):

The Media Reader EDU receives data from each Partitioning Sub Agent, and feeds this data into the Load sub-agents on the database partition for insertion into the table on the database partition. There is one Media Reader EDU per Load Agent.



Number of TCP/IP Ports used
Consider the following diagram, which illustrates the TCIP/IP sockets/ports used to transmit data between Load Pre-Partitioning Agents, Partitioning Agents, and Media Reader EDUs in a DPF environment.



The number of TCP/IP sockets/ports required are:

Let A equal the number of Pre-Partitioning Agents.
(Each Pre-Partitioning Agent will require a TCP/IP connection with each Partitioning Agent).

Let B equal the number of Partitioning Agents.
(Each Partitioning Agent will require a TCP/IP connection with each Pre-Partitioning Agent, and also require a TCP/IP connection with each Media Reader EDU).

Let C equal the number Media Reader EDUs
(Each Media Reader EDU will require a TCP/IP connection with each Partitioning Agent).


The total number of TCP/IP sockets/ports required by LOAD across all database partitions is:
Total = (B + C) + 2 x ((A x B) + (B x C))



In this example:
- We have a 20 database partition configuration, with db-partition #0 used solely as Catalog partition, and tables defined in a database partition group which spans database partitions #1 through #20. 
- The Load operation is invoked on database partition #0 – thus the Coordinator Agent for this Load operation resides on this database partition.
- Two raw input data files were specified in the Load, thus we have two Pre-Partitioning Agents residing on this database partition.
- Each of the two Pre-Partitioning Agents are sending data records to 19 Partitioning Agents (on db-partitions #1 through #20).
- Each of the 19 Partitioning Agents are partitioning and sending data to the 19 Media Reader EDUs (on db-partitions #1 through #20).
- Each of the 19 Media Reader EDUs is processing data and inserting it into the database on its database-partition.


From the Example above, we have:
A (Pre-Partitioning Agents) = 2
B (Partitioning Agents) = 19
C (Media Readers) = 19
Thus, the total number of tcp/ip ports across all database partitions is (B + C)  + 2 x ((A x B) + (B x C))  =  (19 + 19) + 2 x ((2 x 19) + (19 x 19))  =  437.



(Note, you can use the information above to infer the number of TCP/IP ports required on a specific database-partition).


Configuring port ranges

The PORT_RANGE option of
Load allows for the specification of the range of ports that Load should use on each database partition, as described in Configuration options for partitioned database environments.

At the operating system level, the range and quantity of ephemeral tcp/ip ports can be configure on AIX using the TCP_EPHEMERAL_LOW setting. on Linux using the ip_local_port_range configuration.
How To check and change the TCP Anon Port range in solaris,linux,hp-ux,aix.


I hope this information is useful. If you have any thoughts or questions, feel free to submit a comment below.






Sunday 10 March 2019

Db2 and Db2-on-Cloud - Technical Advocate Newsletter – March 2019

image_thumb23_thumb





    Beta -
    DB2 Early Access Program: Beta now Available for the Next Release of Db2
    Get a sneak peek at next generation database technology with the DB2 Early Access Program. Sign up now to shape the future of DB2, test your current infrastructure, and be part of our vibrant community.
    The first beta drop of the upcoming release of Db2 is now available.
    http://bit.ly/2UlwbuU


    Conference & Summits -
    International Db2 Users Group Conferences (North America and Europe)
    Early Bird Registration Is Open!
    Over the years, IDUG has become known for hosting dynamic conferences packed with the cutting-edge content you need. This year will be no exception. Be sure to make your plans to attend. With great sessions, keynote speakers, workshops, and more, you won't want to miss this event.
    North America: June 2-6, Charlotte, NC, USA: https://www.idug.org/page/na2019
    Europe: Oct 20-24, Rotterdam, Netherlandshttps://www.idug.org/p/cm/ld/fid=1634


    Tutorial - Running HA IBM DB2 on Kubernetes
    IBM Db2 is a widely used enterprise database and is part of a larger Db2 family with big data, analytics, and data-streaming use cases. IBM has released and supports a number of DB2 container images on the Docker Store, making it easier than ever to deploy Db2 on your container platform. The aim of this blog post is to demonstrate a Db2 deployment architecture on Kubernetes using Portworx persistent volumes. Following this post, you’ll understand how to run Db2 in production on Kubernetes.
    https://www.ibm.com/blogs/bluemix/2019/01/how-to-running-ha-ibm-db2-on-kubernetes/


    Conferences & Summits - Think 2019 - Watch replays on demand
    Watch Think 2019 replays and videos on demand. Revisit your favorite sessions. Catch all the moments you missed. Share insights with your network. These playlists and videos offer you the chance to relive the highlights of Think 2019.
    https://www.ibm.com/events/think/watch/


    Conference & Summits -
    Central Canada Db2 User Group, April 29-30, 2019
    In its 34th year, the Central Canada Db2 User Group continues to offer quality technical Db2 education. Our 2-day conference brings together many of the best speakers from IDUG NA and the Db2 Labs. For a fraction of the price of larger events, we bring you two days packed with educational opportunities within the professional surroundings of the Bank of Montreal Institute for Learning. If you and your colleagues are in the Toronto area, or can make the trip to join us, you should not miss this opportunity.
    For more details and to register: http://www.ccdb2.ca/home.htm


    Webinar - The Perfect Db2 Server (plus bonus: What's New Db2 V11.1.4.4)
    Klaas Brant, IBM Db2 GOLD Consultant, presents how to achieve an optimally configured Db2 server including memory, storage, instances, STMM, monitoring, and more! Bonus: Kelly Rodger from the IBM Canada Lab will also share with us What's New in Db2 V11.1.4.4!
    Watch the replay: https://www.dbisoftware.com/blog/db2nightshow.php?id=760


    Flash -
    Load Operation May Trap or Cause Data or Index Page Corruption In V11.1.4.4, when Reading Pages From 9.7 Or Prior Release.
    For databases that were created in Version 9.7 or earlier releases, which have been upgraded through newer releases and eventually to Version 11.1.4.4, a LOAD operation performed in Version 11.1.4.4 may trap/crash or cause silent data or index page corruptions, if any existing data or index pages had not previously been updated by non-LOAD operations while running on the newer releases, and then accessed and updated by Load in Version 11.1.4.4.
    http://www.ibm.com/support/docview.wss?uid=ibm10871854


    Webinar - Db2 LUW Native Encryption
    Paul Bird, IBM Senior Technical Staff Member, offers a great presentation to help you learn about Db2 Native Encryption and how to protect your databases from evil-doers! Paul's talk covers preparation, implementation, and operational considerations.
    Register now for March 29,11:00AM ET live broadcast, or replay after this date.
    https://attendee.gotowebinar.com/register/1954676983580277506?source=db2nshome


    Webinar - What's New is Db2 Warehouse 3.4.0
    Join us to hear about "What's New" in the latest release of Db2 Warehouse. Topics to be covered include: Review features of v3.4.0; Demo: IBM Data Replication for Continuous Availability using Db2 Warehouse as a source and as a target -presented by Cheung-Yuk Wu, a senior Db2 Warehouse developer. The webcast will conclude with a Q&A segment. Come with questions!:
    Register now for March 8,12:00PM ET live broadcast, or replay after this date.
    https://ibm.co/2Ur5mVY


    Survey - Lab Advocate Engagement Survey
    If you haven't already, please help us improve the lab advocate program by completing the lab advocate engagement survey.
    http://bit.ly/2019_Db2_lab_advocate_customer_survey

    Translate