A few months ago, an application team requested my help with one of their monthly batch jobs. It was running for an excessively long time, and they needed another set of eyes to see if there was any way to improve it. This article describes the analysis that was performed, the actions that were taken, and the results of the changes.
Only one step in the batch job was running very long. That step consisted of a SORT, which invoked the MAXSORT utility. MAXSORT is a SyncSort for z/OS feature that is designed to sort files that are too large for ordinary sorting techniques to process. MAXSORT breaks up the sort process into individual sorts and then performs a corresponding number of merges. All of this is contained within the execution of one batch step. This feature also uses a breakpoint dataset, which allows the step to be restarted - without the loss of any of the previously sorted data.
In this case, the step was taking a cumulative (year-to-date) file along with the current month's data, extracting specific fields, and then processing the records to produce a year-to-date master file. By the time the job was under review, the cumulative input file consisted of 12 tapes and the current month's data consisted of two additional tapes.
Figure 1 shows a section of the old version of the JCL. Some of the SORTWKnn datasets have been eliminated for easier viewing.
//STEP030 EXEC PGM=SORT,COND=(05,LT),
// PARM='MAXSORT,MAXWKSP=1500,VSCORE=3072K,RESTART=LAST'
//SORTIN DD DSN=application.year-to-date.input.file,
// DCB=BUFNO=20,DISP=OLD
// DD DSN=application.monthly.input.file,
// DCB=BUFNO=20,DISP=OLD,UNIT=AFF=SORTIN
//SYSIN DD DSN=&CTRLCARD(member),DISP=SHR
//SORTWK01 DD UNIT=&SRTUNIT,SPACE=(4096,(90000,50000),,,ROUND)
//SORTWK02 DD UNIT=&SRTUNIT,SPACE=(4096,(90000,50000),,,ROUND)
//SORTWK03 DD UNIT=&SRTUNIT,SPACE=(4096,(90000,50000),,,ROUND)
//SORTWK04 DD UNIT=&SRTUNIT,SPACE=(4096,(90000,50000),,,ROUND)
//SORTWK05 DD UNIT=&SRTUNIT,SPACE=(4096,(90000,50000),,,ROUND)
.
.
.
//SORTWK21 DD UNIT=&SRTUNIT,SPACE=(4096,(90000,50000),,,ROUND)
//SORTWK22 DD UNIT=&SRTUNIT,SPACE=(4096,(90000,50000),,,ROUND)
//SORTWK23 DD UNIT=&SRTUNIT,SPACE=(4096,(90000,50000),,,ROUND)
//SORTWK24 DD UNIT=&SRTUNIT,SPACE=(4096,(90000,50000),,,ROUND)
//SORTWK25 DD UNIT=&SRTUNIT,SPACE=(4096,(90000,50000),,,ROUND)
//SYSOUT DD SYSOUT=&SYSOUT1
//SYSPRINT DD SYSOUT=&SYSOUT1
//SORTBKPT DD DSN=&BREAKPNT,DISP=(NEW,DELETE,KEEP),
// UNIT=SYSDA,SPACE=(CYL,(25,25),RLSE)
//SORTOU00 DD DSN=&APPLICATION.TEMPFIL0,
// DISP=(NEW,KEEP),UNIT=(&TAPE,,DEFER)
//SORTOU01 DD DSN=&APPLICATION.TEMPFIL1,
// DISP=(NEW,KEEP),UNIT=(TP80,,DEFER)
//SORTOU02 DD DSN=&APPLICATION.TEMPFIL2,
// DISP=(NEW,KEEP),UNIT=AFF=SORTIN
//SORTOUT DD DSN=application.master.year-to-date.file,
// DISP=(NEW,CATLG,DELETE),
// DCB=(RECFM=FB,LRECL=1000,BLKSIZE=10000),
// UNIT=AFF=SORTOU00,VOL=(,RETAIN,,15)
//SYSABOUT DD SYSOUT=&SYSOUT1
//SYSDBOUT DD SYSOUT=&SYSOUT1
//SYSUDUMP DD SYSOUT=&SYSOUT1
Figure 1: STEP030 with MAXSORT parm
Essentially, this is a standard SORT step with SORTIN and SORTOUT files. However, because MAXSORT works with intermediate files, the SORTWKnn files must be pre-allocated. In addition, the SORTOUnn files are allocated to tape to perform the actual intermediate sort and merge processing.
The last production job run contained the usage statistics of each of this step's files, and provided several clues that the parameters were not established in the most effective manner. Figure 2 contains an excerpt from that batch job's SYSLOG.
INIT TCB CPU .33*INIT SRB CPU .00*I/O INT CPU 31.68*EXPAND STOR PAGING 0 JOBNAMEM STEP030 07269
SYSTEM ID SYSB*STEP INIT START 02:09*ALLOCATION START 02:09*PROGRAM START 02:09*CARD IMAGE INPUT FOR STP 0
DDNAME I/O BLOCKS DDNAME I/O BLOCKS DDNAME I/O BLOCKS DDNAME I/O BLOCKS DDNAME I/O BLOCKS
JOBLIB 0 JOBLIB 0 JOBLIB 0 SORTIN 2,969,992 SYSIN 1
SORTWK01 5,564 SORTWK02 5,615 SORTWK03 5,624 SORTWK04 0 SORTWK05 0
SORTWK06 0 SORTWK07 0 SORTWK08 0 SORTWK09 0 SORTWK10 0
SORTWK11 0 SORTWK12 0 SORTWK13 0 SORTWK14 0 SORTWK15 0
SORTWK16 0 SORTWK17 0 SORTWK18 0 SORTWK19 0 SORTWK20 0
SORTWK21 0 SORTWK22 0 SORTWK23 0 SORTWK24 0 SORTWK25 0
SYSOUT 0 SYSPRINT 0 SORTBKPT 1,700 SORTOU00 860,399 SORTOU01 686,722
SORTOU02 754,987 SORTOUT 156,321 SYSABOUT 0 SYSDBOUT 0 SYSUDUMP 0
Figure 2: STEP030 SYSLOG statistics
As you can see, there were 25 SORTWKnn datasets allocated, based on the JCL; however, only three of them received any records. That was an immediate cause for alarm, because there were 72 intermediate sorts and merges! This was well in excess of what I believed should have been occurring.I obtained a recent version of the SyncSort for z/OS Programmer's Guide and concentrated on the section for MAXSORT. As I reviewed each of the parameters the batch job was using, it became clear that two things were out of alignment:
- First, the amount of SORT work space was controlled by the MAXWKSP parameter on the SORT statement, not by the number of SORTWKnn datasets.
- Second, the actual amount of SORT work space was not being allocated effectively.
A major concern in developing a solution for this problem was how the cumulative file would be managed throughout the year. In January, the contents would not take up much room, perhaps one tape, whereas by December the contents could potentially grow to where it could total more than 20 tapes. The space allocations had to account for that.
After reviewing several options, here is the solution I developed. Each fully loaded tape equates to approximately three 3390 Mod-3 DASD volumes, or one 3390 Mod-9. Each 3390 Mod-3 volume consists of 3,339 cylinders, while a 3390 Mod-9 volume consists of 10,017 cylinders. My goal was to make sure that one entire tape would be used for each sort/merge operation. That would mean that the tape could be read in its entirety and loaded to DASD with a continuous read operation. At the same time, the intermediate files that MAXSORT uses are on virtual tape drives, and I needed to make sure that they were used in the most efficient manner.
To make the changes, I started by removing the VSCORE parameter. In this case, I was willing to allow the value to equate to the site's default value.
Next, I moved the MAXWKSP parameter to the end of the line so that the value would be easier to update in the future (if necessary). In the original version, this value was set to 1500, which provided 1,500 cylinders of SORT work space. By looking at the JCL in Figure 1, you can see that each of the SORTWKnn datasets was allocated at 500 cylinders. However, as you saw in Figure 2, the batch job's SYSLOG, even though 25 files were coded in the JCL, only the first three were actually used. This accounted for the high number of intermediate sort/mere steps. As part of this review process, I was able to eliminate 10 of the SORTWKnn datasets; you'll see how shortly.
Another factor that required correction was the block size on the SORTWKnn files themselves. It was set at 4096, which was terrible, considering the record being processed was 1000 bytes. The next change was to the allocation amount, which I changed to cylinders. The actual size would be based on the number of records to be processed.
I used the following equations to derive that amount. As stated earlier, a 3390 DASD volume has 3,339 cylinders. I rounded this up to 3,500 to make it easier to calculate. As you recall, I said that each tape volume contains the equivalent of approximately three 3390 Mod-3s. Well, 3,500 x 3 = 10,500, which is the value I established for MSXWKSP. This value also works very well for one 3390 Mod-9 device. I decided to retain 15 of the 25 SORTWKnn datasets. By doing the math, you can see that each one will use 700 cylinders (10,500 / 15 = 700). As I defined them, they will now use the primary and three secondary extents each. This makes the most effective use of the entire SORT work pool. It allows this step to grab a large amount of DASD, but still have it spread out within the entire pool. Because this is a monthly batch job, I made a conscious decision to keep the overall size of each SORTWKnn file small. I wanted to ensure that there was an adequate amount of flexibility in the allocation amounts.
Finally, I made the following additional changes:
Figure 3 shows the newly revised step. Again, some of the SORTWKnn datasets have been removed to fit the constraints of publication.
//STEP030 EXEC PGM=SORT,COND=(05,LT),
// PARM='MAXSORT,RESTART=LAST,MAXWKSP=10500'
//SORTIN DD DSN=application.year-to-date.input.file,
// DISP=SHR,UNIT=(&TAPE,2)
// DD DSN=application.monthly.input.file,
// DISP=SHR,UNIT=AFF=SORTIN
//SYSIN DD DSN=&CTRLCARD(member),DISP=SHR
//SORTWK01 DD UNIT=&SRTUNIT,SPACE=(CYL,(100,200))
//SORTWK02 DD UNIT=&SRTUNIT,SPACE=(CYL,(100,200))
//SORTWK03 DD UNIT=&SRTUNIT,SPACE=(CYL,(100,200))
//SORTWK04 DD UNIT=&SRTUNIT,SPACE=(CYL,(100,200))
//SORTWK05 DD UNIT=&SRTUNIT,SPACE=(CYL,(100,200))
.
.
.
//SORTWK11 DD UNIT=&SRTUNIT,SPACE=(CYL,(100,200))
//SORTWK12 DD UNIT=&SRTUNIT,SPACE=(CYL,(100,200))
//SORTWK13 DD UNIT=&SRTUNIT,SPACE=(CYL,(100,200))
//SORTWK14 DD UNIT=&SRTUNIT,SPACE=(CYL,(100,200))
//SORTWK15 DD UNIT=&SRTUNIT,SPACE=(CYL,(100,200))
//SYSOUT DD SYSOUT=&SYSOUT1
//SYSPRINT DD SYSOUT=&SYSOUT1
//SORTBKPT DD DSN=&BREAKPNT,DISP=(NEW,DELETE,KEEP),
// UNIT=SYSDA,SPACE=(CYL,(25,25),RLSE)
//SORTOU00 DD DSN=&APPLICATION.TEMPFIL0,
// DISP=(NEW,KEEP),UNIT=(&TAPE,,DEFER)
//SORTOU01 DD DSN=&APPLICATION.TEMPFIL1,
// DISP=(NEW,KEEP),UNIT=(&TAPE,,DEFER)
//SORTOU02 DD DSN=&APPLICATION.TEMPFIL2,
// DISP=(NEW,KEEP),UNIT=(&TAPE,,DEFER)
//SORTOUT DD DSN=application.master.year-to-date.file,
// DISP=(NEW,CATLG,DELETE),
// DCB=(RECFM=FB,LRECL=1000,BLKSIZE=0),
// UNIT=AFF=SORTOU00,VOL=(,RETAIN,,15)
//SYSUDUMP DD SYSOUT=&SYSOUT1
Figure 3: Revised STEP030
What this set of changes also achieves is the following: The batch job only needs to do one sort sequence followed by one merge sequence for each tape volume of input. Note, if the tape density doubles, then the step will perform two sort/merge sequences per file. The 10,500 cylinders of SORT work space is small enough to avoid getting all of the tape data in one shot (except for possibly January), so it will effectively continue to use the breakpoint dataset - and this affords the job restart capability. The space is large enough to handle the full volume of each tape in a more efficient manner than had previously been coded. In fact, the entire amount of SORT work space allocated to the step is actually less than what it was before (15 x 700 versus 25 x 500), but now it can all be used.
The other change of significance to note is the use of UNIT=(&TAPE,2) on the SORTIN file. The kind folks in the site's Tape Management group indicated that this would mount two tapes at the step's start. While the first file is read, the second is available. When the first file is no longer needed, the tape can be dismounted and the second file can be read immediately; in the meanwhile, the third tape is mounted. This single change will save approximately five to ten minutes per tape. In other words, only a few minutes savings at the start of the year, but almost two hours at the end of the year!
Figure 4 depicts the SYSLOG of the revised step. Notice how all of the SORTWKnn datasets are used.
INIT TCB CPU .29*INIT SRB CPU .00*I/O INT CPU 35.18*EXPAND STOR PAGING 0 JOBNAMEM STEP030 07302
SYSTEM ID SYSB*STEP INIT START 16:23*ALLOCATION START 16:23*PROGRAM START 16:23*CARD IMAGE INPUT FOR STP 0
DDNAME I/O BLOCKS DDNAME I/O BLOCKS DDNAME I/O BLOCKS DDNAME I/O BLOCKS DDNAME I/O BLOCKS
JOBLIB 0 JOBLIB 0 JOBLIB 0 SORTIN 1,512,524 SORTIN 1,457,467
SORTIN 19,255 SORTIN 198,050 SYSIN 1 SORTWK01 1,262 SORTWK02 1,214
SORTWK03 1,201 SORTWK04 2,008 SORTWK05 1,199 SORTWK06 1,213 SORTWK07 1,201
SORTWK08 1,204 SORTWK09 1,179 SORTWK10 1,202 SORTWK11 1,199 SORTWK12 1,191
SORTWK13 1,188 SORTWK14 381 SORTWK15 1,179 SYSOUT 0 SYSPRINT 0
SORTBKPT 270 SORTOU00 1,049,622 SORTOU01 1,008,870 SORTOU02 609,042 SORTOU02 312,141
SORTOUT 398,415 SYSUDUMP 0
Figure 4: Revised STEP030 SYSLOG statistics
The previous production run of the old version of the step took 2,181 minutes (35 hours, 10 minutes) to process 2,969,992 records. The most recent production run of the new version of the step took 1,341 minutes (21 hours 31 minutes) to process 3,187,296 records. That is a 40% improvement in processing time, despite a 7% increase in the number of records. Furthermore, the previous batch step used 72 intermediate sort/merge processes, while this recent run used only 13. Needless to say, the application team is quite pleased with the results.
In this article, I demonstrated how a batch job with a poorly established set of parameters can be tuned to yield higher performance by a careful review of the options and the SORT work space.
NaSPA member Larry Kahm is president of Heliotropic Systems, Inc., an IBM Business Partner located in Fort Lee, NJ. He has 20 years of experience working with systems and application programmers, vendors, and management to ensure that business applications are developed, maintained, and enhanced with the appropriate set of tools. When not training to run in the ING NYC Marathon, he's busy helping clients with their office networks and home computers.