technical: SMT and Other Things to Hurt Your Chargeback
Picture this. You get a taxi from your house to work, and it costs $10. The next day, it costs $14. The day after, $7. Same trip, same route, same time of day, same type of taxi. You won’t be happy ‐ as a customer you want to pay the same (or at least something close) for the same service. You don't want this bill moving all over the place.
The same is for anyone paying for their mainframe usage based on CPU usage, be it CPU seconds, MIPS or MSUs. You want two identical jobs to cost the same. But they won't. And it's going to get worse.
The Problem With CPU Seconds
Any CPU usage charge is based on CPU seconds – the total number of seconds a task was using the CPU. This may be converted to MIPS or MSUs, or normalised in some way. But it's still based on CPU seconds. However there's a problem with CPU seconds – they’re different for different mainframe models.
Let's say you have one EC12 and one z13. A job running on the EC12 will (all things being equal) use more CPU seconds than the same job on the z13. And this has been happening for a while. Newer IBM mainframes have been more efficient than older ones, giving more benefit (or processing power) for every CPU second. So when charging, most sites will use some sort of normalisation technique to even this out.
But it's not just processor generations. The IBM LSPR (Large Systems Processing Reference) provides estimates of the processing power of IBM mainframes for different z/OS versions: past and present. If you take a look at a z114 running z/OS 2.1, the LSPR lists the total MSU rating for each model. Different models may be the same machine, but with a different number of processors. For example:
- 2818-G01 (one processor): 70 MSUs
- 2818-G02 (two processors): 127 MSUs
- 2818-G03 (three processors): 181 MSUs
Calculate the MSUs per processor, and you'll see a difference. The G01 has 70 MSUs for each processor, the G02 63.5 MSUs per processor, and the G03 60.3 MSUs per processor. The more processors, the less work each processor can do. This is because the mainframe needs more processing power to manage all the processors. So more processors means less work for every CPU second. Something to think about when you upgrade your processor.
SMT, Cache and Pipelines
With more modern processors, things get even worse. Let's start with caching. If a CPU instruction needs something in memory, it will need to go out to memory, leaving the instruction holding onto the processor until it returns. The slower this memory access, the more CPU is used waiting. Modern processors use caching to speed this memory access up, reducing CPU usage. Processors also do some nice tricks to improve the efficiency of the cache. Our article How Fast Is Your Mainframe discusses this more. What this means is that some workloads will use cache more efficiently than others. So a high CPU, low I/O workload will use cache more efficiently than a lower CPU, high I/O workload. Or in other words, that high CPU, low I/O workload will get more from every CPU second.
However the exact cache efficiency will change from moment to moment. So if you run the exact same job twice, the chances are that each will use a different number of CPU seconds. Such CPU variability has been increasing with each new processor design.
This is made even worse with the latest trick to improve processor efficiency: Simultaneous Multi-Threading (SMT). SMT was introduced in the mainframe from the z13, and allows two threads to run on one processor at the same time. The idea is that if one thread is waiting for a memory access (or some other resource), the second thread can use the processor. This can really mess with CPU calculations. Let's take an example of two tasks executing. With SMT, both tasks can be scheduled on a single processor by the operating systems dispatcher. One will run first, while the other waits. Let's say this takes one second. Once the first needs a resource, it will wait, and the second task will run. Suppose this takes another second. As far as the operating systems is concerned, both tasks have been using the processor for two seconds. But in fact each has only been using it for one. z/OS 2.1 performs some magic to deal with this, reporting CPU usage as if each thread had the processor to itself. This is very new, so there are no statistics or reports on how accurate or repeatable this is.
The IBM z13 can be configured to automatically switch between two modes: one thread at a time (SMT Mode 1), or SMT with two threads (SMT Mode 2). Mode 1 is more efficient for CPU intensive batch processes, Mode 2 for I/O intensive I/O applications. Currently SMT is only supported on zIIP processors, but the smart money is that this will be available on general processors soon.
Alternatives to CPU
So what are the alternatives to CPU Usage? From z/OS 2.1, IBM has introduced a new field in the SMF Type 30 records with a count of instructions. Not yet available for other SMF records, this could allow batch jobs to be charged on the number of instructions executed, not the CPU time. This would be a more stable metric, but is further removed from the comfortable CPU second that has been the backbone of mainframe chargeback for many years.
Another option could be to use Relative Nest Intensity (RNI) as a normalising factor for CPU usage. A measurement of cache efficiency, RNI can be recorded in IBM SMF records over time. However this is not a straightforward solution with today's SMF processing technology. It also doesn't solve the issues raised by SMT.
Today a CPU second isn’t necessarily a CPU second. The amount of work you can get from your CPU second will change from moment to moment, and is determined by many things, including workload mix and processor configuration. This will mean that your user's CPU bills may differ from day to day for the same workload. Anyone working with chargeback would be wise to educate their users about why that is.