technical: Comparing CICS/VSAM Performance With and Without RLS
We've been working with a client with our partners, CPT Global, to convert their CICS/VSAM files to RLS. This allows them to be updated from multiple CICS regions at the same time. Our partner article talks about how we did this: investigation, preparation, and execution.
One of our largest concerns was performance: the impact that RLS would have on our applications. In this article, we talk about how we measured the VSAM performance before and after RLS from a CICS perspective.
Before starting conversions, we looked for benchmarks. The only one we could find was presented by IBMs Andre Clark and Neal Bohling at the 2015 Share conference. As part of this, they presented the following graph:
(Source: Getting the Most out of your VSAM Data Sets in CICS Using RLS, Share 2015, Clark/Bohling)
This looks really exciting: they saw some performance improvements. But of course, this could be misleading. Their transactions were not threadsafe, and with the following characteristics:
- Average of 6 file requests per transaction.
- MRO Long running mirrors.
- 69% Read, 10% Read for Update, 9% Update,11% Add, 1% Delete.
Our mix was different. Files with higher I/Os were almost exclusively read/browse. However, other files had a higher update mix:
Our environment had MRO long running mirrors, FCQRONLY=NO, LOG(ALL). Some of our transactions were threadsafe, others were not. We had long-running transactions, so a pure transaction response time doesn't tell us much. So, we normalized our file response times by number of MQ requests.
Most of our files were journaling to a logstream for forward recovery, increasing the times of update operations.
What we wanted to do was compare the performance as seen from our CICS transactions with and without RLS. The easiest way we found to do this was to use CA SYSVIEW. This produces performance statistics for each file, broken down by transaction. The bad news is that this is averaged across all operations (not broken down by browse, add etc), and isn't written out to a dataset every hour. It just is a summary since the CICS region started. But it gives us a guide. Here's an example of what we found:
Before we look at the results, let's note a couple of things:
- The figures are averaged for all operations: no differentiation between read and write. The usage statistics on the left (in grey) are from CICS end of day statistics, to give us an indication of the mix of operations.
- We can probably assume that the mix before and after the change is similar.
- Some of the differences when measures as a percentage are high. For example, one file has a 414% response time increase. But this is only 20 microseconds (0.02 milliseconds): not a big change.
So, we found that there wasn't a lot of difference before and after our RLS conversion. Some increases, some decreases.
File Performance (Part 2)
Another way to look at file performance is from the SMF type 42 (subtype 6) records. This shows statistics by file (separate records for data, index, and alternate index components). These are written when a SMF type 30 interval record is run (hourly at our site), or the dataset is closed.
This is a z/OS-centric view, so doesn't break the information down by CICS transaction.
Each record includes the job that caused the I/O. For non-VSAM, we limited this to the CICS regions: excluding batch. However, for RLS, the job is SMSVSAM: the address space managing VSAM. No way to separate out CICS from batch. To get around this, we looked at the online day when the files were only accessed from CICS. Here's a summary of what we found for 19 different files converted to RLS at the same time:
A small increase of around 10% in average response time.
The dataset-level performance was interesting, but was for all transactions. Some transactions may have been affected more than others. So, we went to the SMF110 records created at the end of each transaction (the MXG CICSTRAN dataset). This doesn't break things down by dataset, but gives the total wait for file I/O for the transaction.
One of the fields of CICSTRAN is WTFCIOTM: the time the transaction waits for file I/O. This is great for non-RLS files, but cannot be used for RLS files. Rather, the WTRLIOTM field must be used. So we looked at these two fields to compared VSAM file performance from a transaction point-of-view.
Here's an example for one transaction:
File I/O is only a small part of the total response time isn't it? Most of our service times were from journal/DFHLOG overheads. So, in this case, RLS caused a small increase. And that's what we found for most of our transactions.
But there was a problem with this method. We found that for threadsafe programs, the WTRLIOTN field was not populated by CICS. This was confirmed by IBM. Now, although the transaction for the graph above was not threadsafe, many were. So, we were prevented in many cases from using WTRLIOTN.
So, what we did was to look at the overall transaction response time: comparing before and after the change.
What We Didn't Use
By now, you're probably thinking of a few other options, and wondering why we didn't use them. There were a couple of reasons why.
Many of the options are only for RLS: great when working with RLS performance, bad when comparing performance with and without RLS. This includes the SMF type 42 (subtype 16) records and the RMF Monitor III RLS statistics.
Others don't provide performance statistics. This includes CICS end-of-day statistics, and SMF type 64 records.
There are other statistics that can be used to monitor things like buffer usage, cache performance and more. These are important, and were monitored to ensure the best RLS performance was obtained. But they didn't provide dataset-level performance statistics, and can't be used to compare performance with and without RLS.
Overall we found that converting to RLS resulted in some small increase in file service times. However, syncpoint/journal times were the biggest component of many transaction service times. So, this small increase from RLS did not have a large effect on our overall service time.