technical: Working With Large Quantities of SMF Records
In the last edition, I wrote an article that described how I would start a CPU reduction project
by producing a table of the highest CPU consuming
programs during a peak 1-2 hour period. This table is created by processing SMF Type 30 interval records, which introduces a problem. One SMF Type 30
record is produced by every address space every recording interval (usually 15-30 minutes). So there are a lot of these records to work through. What's
worse, these records are amongst the millions of SMF records produced constantly by z/OS. So this processing will take time and CPU seconds. Lots of it.
This is a very common problem when processing any SMF records. From producing security related reports to DASD performance or DB2 capacity planning.
When you work with SMF records, you will almost always need to manage and process very large quantities of SMF records. So let's look at some tricks to
make our lives easier.
Processing Raw SMF Records
z/OS continually writes SMF records to in-memory SMF buffers. These buffers are then regularly written to either VSAM datasets or log streams by z/OS
(depending on how the systems programmer has set it up). Every site will have regular jobs that empty these VSAM datasets/log streams, archiving the SMF
data to disk or (more likely) tape. So it sounds easy doesn't it? To get our SMF records, we just scan through these archived SMF records.
And products like Merrill's MXG, IBMs TDSz and CA-MICS help us out. They provide tools and facilities to map each SMF record, and extract the information
we need. So if I'm at site with MXG, I can extract the SMF Type 30 records I need using a job like:
//SAS EXEC SAS,
//SASLOG DD SYSOUT=*
//SASLIST DD SYSOUT=*
//WORK DD LRECL=27648,BLKSIZE=27648,SPACE=(CYL,(150,100))
//SOURCLIB DD DSN=MXG.SOURCLIB,DISP=SHR
//LIBRARY DD DSN=MXG.FORMATS,DISP=SHR
//SMF DD DISP=SHR,DSN=SMF.ARCHIVE
//SYSIN DD *
WHERE (TYPETASK='JOB' OR TYPETASK='STC') AND CPUTM > 1 AND
HOUR(SMFTIME) > 9 AND HOUR(SMFTIME) < 11;
Those familiar with SAS will see that I'm working through an SMF archive (the dataset SMF.ARCHIVE - either on tape or disk), and putting a list of all
SMF Type 30 records produced between 9am and 10am into the JOBS1 SAS file.
When working with SMF records, the secret is to work with as few as possible. And this is what I'm doing in this job. If you look more closely, you can
see that I'm only listing jobs and started tasks (no USS processes, or APPC address spaces), and skipping every SMF Type 30 record with less than one second
CPU time (There will be many records with 0 CPU if the task has not been dispatched. I'm not interested in those).
However I'm doing this after the %INCLUDE SOURCLIB(TYPE30). This means that MXG has already moved SMF Type 30 records into SAS files before my JOBS1
data step starts. To limit the records that MXG will process, you can use the MXG macro facilities.
Now, most sites I've seen archive their SMF records into daily tape datasets. So if you were to run this job against the SMF archive for one day,
you'll be working through every SMF record for that day. Or in other words, you'll be waiting a couple of hours, with a corresponding CPU usage bill.
There are a few ways to reduce the CPU overhead, but it's never going to be cheap. An excellent reason to run these jobs out of peak-periods: usually
If this is an ad-hoc job, then this is no problem. However if this job is to run regularly, then it's going to be expensive. A classic example of this
is your regular IBM SCRT job that process SMF Type 70 and 89 records.
Limiting Raw SMF Records
If a site is going to regularly process specific SMF records, it makes sense to create a separate SMF archive holding only those SMF record types. This
way any job only has to work through the SMF record types they need, rather than scanning through them all. This is also important if you need to archive
some SMF records types for long periods for compliance (like RACF Type 80). IBM helps us out here. The IBM utility that archives SMF records: IFASMFDP
(for SMF VSAM) and IFASMFDL (for log streams) can be setup to do just this. For example, look at the following JCL:
//STEP1 EXEC PGM=IFASMFDP
//SYSPRINT DD SYSOUT=*
//DUMPIN DD DISP=SHR,DSN=SYS1.MAN1
//SMFOUT DD DSN=DUMP.SMF.ALL,DISP=MOD
//SCRT DD DSN=DUMP.SMF.SCRT,DISP=MOD
//SEC DD DSN=DUMP.SMF.SECURITY,DISP=MOD
//SYSIN DD *
This job dumps all SMF records into the dataset DUMP.SMF.ALL. Additionally, SMF records needed for the IBM SCRT (Type 70 and 89) go into DUMP.SMF.SCRT,
and RACF records (Type 80) go into DUMP.SMF.SECURITY. The good news is the additional work writing to additional SMF dump datasets doesn't incur a large
CPU overhead. There are also a few ways to further reduce CPU overhead of SMF dump jobs.
You can go further and split up SMF records by application. So CICS1 data would go into one SMF archive; CICSB into another. However this takes some
assembler programming skills to create exits for IFASMFDP/IFASMFDL.
Another options is to avoid accessing the raw SMF records altogether. This can be done by creating databases holding summarised data. For example,
summarising SMF Type 70 records by hour, or by day. So instead of working through every record, you can just access the summarised results. This approach
is done by most sites when handling performance and capacity planning records, such as SMF Type 70/72, 110 (CICS) and 101 (DB2). These databases can be
stored in DB2 (for TDSz), SAS files (for MXG and CA-MICS), or any other database system you choose. TDS, MXG and CA-MICS all include features to produce
these summarised databases, which are often called Performance Databases, or PDBs.
These PDBs are great, and usually satisfy most needs. However I find that I'm regularly going back to the raw SMF records. Usually this is because I
need information from a non-summarised record (such as Type 30 for CPU usage, Type 80 for security, or Type 14/15 for a dataset open), or I need more detail.
Processing SMF On a Workstation
Even after getting my Type 30 records, the chances are that I'll have tens of thousands of them to process. In the old days, I would sit down and code
SAS statements to produce the tables and graphs that I want. And this is still an option. But tools like Microsoft Excel produce results much faster. The
trick is to get the data to them.
Let's take my Type 30 extract job above. What I tend to do is create a CSV file with the Type 30 records. I can do this in SAS by adding this code to
the bottom of the job:
ODS _ALL_ CLOSE;
ODS CSV FILE=CSV RS=NONE;
VAR JOB DATE SYSTEM
CPUSRBTM CPUTCBTM CPUTM PROGRAM CPUIFETM CPUZIETM;
ODS CSV CLOSE;
This will take the SMF records from the SAS file and output the figures I want to a CSV file. Transfer this to my PC, open with Excel, and I'm away.
For smaller sites, this works great.
However I find for larger sites, there is still too much data to process with Excel. In these cases, I fire up the free MySQL database I have installed
on my laptop. I'll import the CSV created above to MySQL using SQL like:
LOAD DATA LOCAL INFILE 'C:/SMF30V.csv' INTO TABLE SMF30 FIELDS
TERMINATED BY ',' ENCLOSED BY '”' LINES TERMINATED BY '\n'
(job, datetime, system, cpu_srb, cpu_tcb, cpu_total, program,
I can then create subsets of the data using SQL statements, and import them into Excel for processing.
If this all seems too hard, then the Australian based Black Hill Software can help. Their software product Easy SMF for z/OS does all of this for you.
Download the raw SMF, and Easy SMF will process it, and produces graphs or tables. You can also import this data into Microsoft Excel for your own
processing. The disadvantage is that Easy SMF only supports a subset of SMF records.
If you start working with SMF records, the chances are that you will be handling large quantities of records. The good news is that there are ways to
ease the pain of processing these records.