STAT.NLM: A Tool for Measuring NetWare v3.11 Server Resource Utilization

Articles and Tips: article

RON LEE
Senior Consultant
Systems Research Department

01 Mar 1992

This AppNote is a guide to operations and a technical reference for a new NetWare Loadable Module (NLM) called STAT.NLM. STAT.NLM is a server application that records v3.11 server resource utilization statistics and exports those statistics for characterization and analysis by other programs such as databases and spreadsheets.

The Most Widely Used Server Statistic
Guide to Operations
Technical Reference
Availability

The Most Widely Used Server Statistic

In 1981, the architects of NetWare developed a simple server console screen, called MONITOR, to help Comdex attendees see what the file server was doing. At the time, monitoring server activity was a fairly new concept. One of the items displayed on the screen was "Utilization (%)" (see Figure 1). This percentage, updated once a second by the server, indicated how much of the CPU's time was spent handling network requests.

Figure 1: In early versions of NetWare, the MONITOR screen displayed a metric describing server utilization.

-- SFT NetWare 286 II+TTS V2.1 - Utilization (%) = 25 -- Disk I/O Pending =4 --
| Stn 1: Search Next        | Stn 2: Read File        | Stn 3: Get File Size   |
| ------                    | ------                  | ------                 |
| File                 Stat | File               Stat | File              Stat |
| ----                 ---- | ----               ---- | ----              ---- |
| ?.MSG              2 PRPW | F:CHANGES.86     2 PRPW | O:INDEXPO.86    4 PRPW |
| ?.MST              2 PRPW | W:KEN}.BV1       2 PRPW | F:RLP}.BV2      4 PRPW |
| ?.MSC              2 PRPW | W:KEN}.TV1       2 PRPW | O:RLP}.TV2      4 PRPW |
| ?.MSR              2 PRPW | Y:WP.EXE         2  RP  | O:RLP}.BV1      4 PRPW |
| ?.$T070006         2 PRPW |                         | Y:WP.EXE        4  RP  |
|                           |                         |                        |
 ---------------------------+-------------------------+------------------------
| Stn 4:                    | Stn 5:                  | Stn 6:                 |
| ------                    | ------                  | ------                 |
| File                 Stat | File               Stat | File              Stat |
| ----                 ---- | ----               ---- | ----              ---- |
|                           |                         |                        |
|                           |                         |                        |
|                           |                         |                        |
|                           |                         |                        |
|                           |                         |                        |
|                           |                         |                        |
|                           |                         |                        |
 ------------------------------------------------------------------------------

Since that time, server utilization (CPU %Utilization) has become perhaps the most widely used server statistic - but there is a definite weakness. Anyone who has tried watching the %Utilization figure on a NetWare server console will recognize this pattern of numbers:

2, 21, 2, 27, 2, 1, 0, 0, 0, 11, 10, 9, 27, 33, 4, 3, 2, 1, 1, 1, 1...

Following the gyrations of the %Utilization figure is like watching a bouncing ball - and trying to record the height of each bounce, every second...for hours. It's hopeless and frustrating.

This AppNote provides the solution. STAT is a new software tool that records NetWare v3.11 server workload statistics, including the %Utilization figure, and exports those statistics for charting and analysis by other programs such as databases and spreadsheets.

But It's Only Half of the Story

The CPU %Utilization figure is one of the most useful metrics produced by the server. Yet, in many cases, CPU %Utilization tells us little about what's really happening to server resources. This is because new technologies, such as bus mastering, and other improvements in the LAN and disk channels are relieving the CPU of its high profile of the past.

With these new technologies, and with CPU %Utilization as the only available metric, the server becomes a "black box" of sorts. Requests go in and responses come out, but there is no way to measure their impact. The queuing model in Figure 2 represents the limited perspective of servers the industry has relied upon for years.

Figure 2: This simple queuing model provides a limited perspective of the server.

Thinking of the server this way provides little help when you're trying to identify bottlenecks or measure the effectiveness of your tuning modifications and upgrades.

In contrast, STAT records the utilization statistics of each of server's major resources once each second. These major resources are:

The CPU
The LAN communications channel
The disk channel
The router
Active connections

These metrics allow you to chart daily activity and identify trends within your production environment. Using this information, you can perform tuning modifications and upgrades, and even capacity planning, and have the ability to measure both the need for and the effectiveness of your changes.

Guide to Operations

STAT.NLM is a management utility you can load and unload from server memory while the server is running. Following NLM conventions, you can load STAT in one of two ways:

STAT.NLM may be loaded, used, then unloaded, all at the server console.
Or, you may place the "LOAD STAT.NLM" console command in the AUTOEXEC.NCF file to be executed every time the server is booted. (In this case, however, STAT.NLM must be run manually; STAT has no command line capabilities.)

DUMPSTAT.EXE is a conversion utility designed to create spreadsheet- or database-compatible text files from the binary trace files created by STAT. DUMPSTAT processing is intentionally designed to be performed on a client workstation rather than on the server. (See "Using DUMPSTAT to Convert STAT Output" and "Minimizing STAT's Effect on Workload" for more details).

Installing and Loading STAT.NLM

Using the VOLINFO utility, make certain you have enough free disk space in the SYS:SYSTEM directory for STAT's output files. The resulting trace files can have a maximum size of 2.6MB for a twenty-four hour test. Since the SYS:SYSTEM directory is also used for print spooling, make sure enough space is available for both STAT output and print spooling, with room to spare.

Copy the STAT.NLM loadable module file to the SYS:SYSTEM directory. NetWare's NLM search path defaults to the SYS:SYSTEM directory. If you have specified another directory for loadable modules with the SEARCH command, STAT.NLM may alternatively be placed in that location.

To load STAT.NLM manually, use the "LOAD STAT" console command. STAT will automatically load all related NLMs not currently resident.

The STAT console screen will then appear, as shown in Figure 3. When STAT.NLM loads, its state defaults to "Ready," which means that STAT is currently idle.

Running Tests with STAT

Using the console screen provided by STAT, you can start gathering statistics manually or by setting triggers for timed tests.

Manual Tests. Manual operation allows you to start and stop the capture of statistics at will. This feature is well suited to troubleshooting, benchmarking, and general investigation of server utilization on the fly.

Figure 3: You control and monitor STAT.NLM with the menu options on the STAT console screen.

S T A T  v2.0


CONFIGURATION                                             15:31:00
Status:            Running...
Next file name:    USR10619.002
Current file name: USR10619.001
Current file size: 802456
Start date/time:   Feb 05 1992 07:30:00
End date/time:


TRIGGERS
Start trigger:     07:30:00
Stop trigger:      17:30:00


MENU
1 - Start
2 - Stop
3 - Rename next file
4 - Set triggers
5 - Exit


Enter choice:

Select "1 - Start" from the STAT menu. STAT will immediately change its status to "Running..." and begin to capture server workload statistics. These statistics will be saved in the file listed next to "Current file name." (For more information, see "Renaming the Trace File" below).

STAT will continue to collect statistics until the test is stopped manually.
To stop a manual test, select "2 - Stop" from the STAT menu. STAT will stop gathering data, flush its data buffer to disk, and end the current test. The extension of the "Current file name" will also be incremented by one and displayed in the "Next file name" field. This allows you to run subsequent tests without overwriting data from your prior tests. It also frees you from the need to name each test and subsequent trace files manually.

Timed Tests. Triggers allow you to use STAT for network management purposes. Using triggers, you can set up STAT to start and stop capturing statistics unattended, on a daily basis. If you choose to automate the charting process, triggered tests can provide you with a daily picture of resource utilization within your server.

Select "4 - Triggers" from the STAT menu.
STAT will prompt you for a start trigger based on military time. Type the start time (including colons) as shown below, and press <Enter<.
```
Enter start trigger time    07:30:00    <Enter<
```
After you've enteredthe start trigger information, STAT will ask for a stop trigger (again, you must type the colons).
```
Enter stop trigger time    17:30:00    <Enter<
```

After you enter each trigger, the new triggers will appear on the screen under the TRIGGERS heading. STAT will then change its status to "Waiting for start trigger..." and go to sleep until the trigger time arrives.

To keep STAT as unobtrusive as possible, the STAT trigger mechanism is simple and therefore has several weaknesses.

Trigger times may not be set within plus or minus one minute of the time 00:00:00.
If the current time lies between a set of newly defined triggers, STAT will not start until the start trigger is encountered the following day.
You must type the colons within the military time format when entering trigger information.

You can abort triggered STAT tests by selecting "2 - Stop" from the STAT menu. Trace files from aborted tests will be closed normally. Otherwise, STAT will end the test automatically when the stop trigger is encountered. The extension of the "Current file name" will also be incremented by one and displayed in the "Next file name" field. This allows a new test to be started unattended the next day without overwriting your previous test's data.

Renaming the Trace File

The default trace file name is STAT.000. The extension of this file name is automatically incremented with each test to protect trace file data from prior tests. However, I suggest that you use a meaningful naming scheme that includes a reference to the server being characterized. For example, I use USR10910 to represent a test started September 10th, on server PRV-USER1.

To rename the next trace file used by STAT, select "3-Rename next file" from the STAT menu. STAT will then prompt you for a DOS file name of up to 8 characters with no extension (the extension will always default to 000 and increment for each subsequent test).

Exiting STAT

To exit from STAT and remove the STAT loadable module from server memory, select "5 - Exit" from the STAT menu. During the exit process, STAT will record the current trace file name so that later uses of STAT, in its default mode, do not overwrite trace files from previous tests. When STAT exits, it returns all allocated resources to the NetWare operating system.

Using DUMPSTAT to Convert STAT Output

STAT automatically places its output files in the SYS:SYSTEM directory. To save disk space, these output files are written in binary format. DUMPSTAT.EXE is a conversion utility designed to create spreadsheet- or database-compatible text files from the binary trace files output from STAT. The command format for DUMPSTAT is:

DUMPSTAT inputPath [outputPath] [-L | -X | -S | -N num |
  - M[num]]

Defaults. If you don't specify a filename for outputPath, the default is standard output to the screen (-S format). If you do specify a filename, the format defaults to - L (Lotus). The outputPath parameter will not accept a file extension.

Input Path. The input path is the path, including file name, for the trace file that you want converted.

Output Filenames. DUMPSTAT creates output files using the base filename specified in the outputPath parameter. It distinguishes between the various output files by incrementing the extension:

outputPath.DS0 outputPath.DS1 ... outputPath.DSF outputPath.SUM

The last file, with a .SUM extension, is a summary file. An example is shown below.

--------- Test Summary ---------
Start Time: Sat Feb 08 14:22:30 1992
End Time: Sat Feb 08 14:27:07 1992
Elapsed Time: 00:04:37
Records Read: 277
Records Written: 277
Minimum Polling Loops: 4969
Maximum Polling Loops: 50693

Output File Formats. DUMPSTAT can save output files in any one of three formats:

-L	Lotus format (comma delimited)
-X	Excel format (tab delimited)
-S	Screen format (column justified)

Output File Size. The size of the output file is determined by the "-N num" parameter, where num is the number of records per output file. The default is 7200 (a separate file for each two hours of statistics). This default assumes you'll be using a spreadsheet for analysis.

The size option is useful when you use a database as the target analysis program. For example, if your STAT test runs for more than two hours, this option can be used to tell DUMPSTAT to place all of the records in one file.

Data Reduction. In the "-M[num]" parameter, num is the number of records to be reduced to a single record by calculating the statistical median. The default is 60 (one record per minute). This option is useful for network management reporting when you want to plot an entire day's workload on one graph.

Charting Examples

Placed in the right format, STAT results can be excellent persuasion tools for management, even non-technical management. The following three figures show examples of STAT results graphed using Borland's Quattro Pro spreadsheet program.

Figure 4: Graph of CPU %Utilization for two hours (7200 seconds).

Figure 5 shows a graph of ten hours' (36,000 seconds) worth of data that has been reduced with the DUMPSTAT - M option to one point per minute (7200 points).

Figure 5: Graph of CPU %Utilization for ten hours (36,000 seconds of data reduced with the DUMPSTAT -M option).

Figure 6 shows a graph of CPU %Utilization along with LAN communications channel utilization and the number of active connections. Overlaid graphs like this one help you see possible cause-and-effect relationships for utilization peaks.

Figure 6: Graph of CPU %Utilization overlaid with LAN communications channel usage and active connections.

Technical Reference

The development of the STAT tool involved several intentional design decisions. These decisions and the technical specifications for STAT are discussed below.

Design Decisions

Primary Purpose. After considerable research into the characteristics of workloads processed by NetWare servers, I began to focus on the impact of those characteristics on specific resources inside the server. I quickly became frustrated watching the CPU %Utilization figure flash on the screen. Not only was there no way to accurately record the metric, but it didn't even represent all the resources I was interested in measuring.

The primary purpose of the STAT toolkit is to provide the measurements needed to identify the impact of workload - resource utilization - on the server. These statistics, along with some knowledgeable analysis, provide a more accurate, composite view of the server. Ultimately, STAT will allow us to identify whether individual resources are under-utilized, appropriate, or bottlenecking under production workloads.

Data Collection Frequency. The rate at which STAT records statistics within the OS is once per second. This rate cannot be changed.

Minimizing STAT's Effect on Server Resources. STAT is designed to be as unobtrusive on the server as possible.

Perhaps you have heard of the Hiesenburg Uncertainty Principle, which states that the measurement of a phenomenon can be altered by the measurement process itself, thereby producing an inaccurate measurement. We wanted to reduce, as much as possible, the effect of this principle on the server being measured. So we made a design decision that separated processing absolutely required on the server from processing that could be done elsewhere, at a later time.

Due to this decision, STAT does very little more than collect the needed information from which the statistics are derived. No processing of the data occurs on the server.

Even the write process within STAT was affected by this decision. Rather than write a record to disk each second, STAT stores the data in memory for 60 seconds, thereby performing only one write per minute.

By using the NetWare v3.11 MONITOR utility with the /P parameter, you can observe the success of these design decisions by following the process utilization figures for the STAT processes.

Resource Statistics Tracked by STAT

The resource utilization statistics tracked by STAT are taken from a set of undocumented NetWare variables. These variables can be exported for use by other NLMs via the SS.NLM documented in "NetWare v3.x Operating System Statistics Exposed!" (NetWare Application Notes, July 1991).

The variables read and recorded by STAT include:

Number of Polling Loops	The numberof times the Polling Process has been executed.
Maximum Number of Polling Loops	The maximum number of timesthe Polling Process has been executed.
Bytes Received	The number of bytes read from the LAN communications channel.Thisnumber is provided by the LSL servicesand therefore includes data from packets usingthe server as a router.
Bytes Transmitted	The numberof bytes written to the LAN communicationschannel.This number is provided by the LSLservices and therefore includes data frompackets using the server as a router.
Bytes Read	The number of bytes read from the disk channel (hardware I/O).
Bytes Written	The number of bytes written through cache to the diskchannel.
Packets Routed	The number ofpackets using the router services of theserver only. This number is provided by theserver's IPX protocol stack.
Number of Connections	The number of active connections to the server.

The CPU %Utilization Calculation

CPU %Utilization is not calculated by the operating system; rather, it is derived by utilities (such as MONITOR) from two variables kept by the operating system - Number of Polling Loops and Maximum Number of Polling Loops. DUMPSTAT uses the same algorithm as MONITOR to calculate CPU %Utilization:

%Utilization = 100 - (100 x Number of Polling Loops/Max. Number of Polling Loops )

Due to the minimizing concerns discussed above, STAT only records the variables used to calculate CPU %Utilization. Later, during the binary-to-ASCII conversion performed by DUMPSTAT, the polling statistics are discarded and replaced by the utilization figure.

Trace File Dimensions

The trace file is made up of an initial time stamp record followed by a statistics record for each second of STAT activity. The time stamp record contains two long integers - one for the start time stamp and one for the stop time stamp. Each statistics record contains 8 long integers - one for each metric listed in "Resource Statistics Tracked by STAT" above.

STAT.CFG File Description

The STAT configuration file (STAT.CFG) is used to store the "Next file name" when exiting from STAT. This file is written to the SYS:SYSTEM directory during the exit process.

Each time STAT is loaded, it looks for this configuration file. If found, the enclosed file name is used as the new "Next file name." This precautionary measure helps prevent new trace files from overwriting trace files from previous tests.

If a configuration file does not exist, STAT uses the default file name STAT.000.

Availability

STAT.NLM and DUMPSTAT.EXE can be obtained here.

Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.