A Test Workload Analysis of LANQuest Lab's Application Benchmark (LAB) Test Suite
Articles and Tips: article
Senior Consultant
Systems Research Department
01 Jul 1993
This AppNote provides a workload characterization of LANQuest's Application Benchmark (LAB) test suite for network server and network operating system (NOS) benchmarking. The results of this analysis allow you to compare the characteristics of this test workload to those of your own production network before using LANQuest's test results to make inferences regarding your production system.
Previous AppNotes in This Series Mar 93 "Using Production Workload Characteristics to Validate LAN Performance Evaluation Studies" May 92 "Identifying Test Workloads in LAN Performance Evaluation Studies" May 91 "An Introduction to Workload Characterization"
Introduction
A test workload analysis (TWA) is a detailed measurement and description of a test workload - in this case, an industry benchmark produced by LANQuest Labs. This analysis includes an outline of the test's design and operation, a detailed look into the workload the test produces, and a comparison of the test's workload characteristics to those of production environments.
In this AppNote, we try to determine exactly what LANQuest's tests are measuring. This analysis is helpful for anyone trying to decide whether LANQuest's test results are meaningful and useful for a given production system. We identify valid uses of the test as well as other uses that can only be called abuses.
This TWA is the first in a series of AppNotes covering all test workloads that are available for use in the network computing industry.
Introduction to the Test Workload
LANquest uses a benchmark called the LAN Application Benchmark (LAB) Test Suite to test network systems and components, including:
Network servers
Network operating systems
LANQuest documentation also refers to this suite as the LANQuest LAN Performance Benchmark Test Suite.
LAB employs three off-the-shelf applications: Lotus 1-2-3 v2.01, Microsoft Word v3.10, and Borland dBASE III Plus LAN Pack v1.1. Each LAB application is scripted and runs for approximately 15 seconds.
LANQuest Labs recently used LAB to compare the performance of three network operating systems for IBM - Novell NetWare, Microsoft LAN Manager, and IBM LAN Server. IBM distributes these LANQuest Labs test results to their customers during competitive sales situations. LAB test results have also been used by trade publications, including PC Week, LAN Magazine, and in the evaluation of LAN operating systems that appeared in the August 1988 issue of Personal Computing.
Test Specifications
The chart in Figure 1 lists information and specifications about the LANQuest Application Benchmark (LAB) tests.
Figure 1: LANQuest Application Benchmark test specifications.
Test WorkloadName: |
LANQuest ApplicationBenchmark (LAB) Test Suite |
Author: |
LANQuest Labs1251Parkmoor AvenueSan Jose, CA 95126(408) 283-8900 |
Version: |
June 1992 |
Workload Type: |
Scripted DOSapplications using Borlands' Superkey(Lotus1-2-3 v.2.01, Microsoft Word v.3.10, andBorland dBASE III Plus LAN Pack v1.1) |
System Under Test |
DOS networkwith up to 21 DOS-based personalcomputers |
Metrics: |
Client testduration measured with Norton'sTIMEMARK.EXE(accuracy to the nearest second |
WorkloadDuration:(approx) |
Lotus scriptduration = 13 sec.dBASE script duration= 17 sec.Word script duration = 14 sec. |
Documentation: |
Users Guide, 10 pages |
Availability: |
Licensed distribution |
Cost: |
$6000 license fee |
Design Goals
In the LAB User's Guide, LANQuest describes their design goals. The following is an outline of those goals.
1. Produce a system test workload rather than a component test workload.
LANQuest calls the LAB test suite a "system-level test." This classification is based on their belief that the tests "do not attempt to isolate and measure individual variables." They go on to say that if a test does single out individual variables, then the test's results often mean little to the end user because they don't measure the "interaction between variables."
2. Use a limited number of workstations to emulate a real-world network with many users.
Each test is designed with no pauses to generate a "maximum load per workstation." In doing so, they estimate that each LAB client generates the equivalent of 6 active users running similar applications in a typical office environment. Based on experience with both end-users and test networks, LANQuest says results from a 21-workstation test can be used to emulate a "substantially larger network."
LANQuest uses "common application programs" to emulate real-world network environments.
3. Measure the relative performance of networks.
LANQuest states that the LAB tests are a "well-constructed and consistent set of independent tests which measure relative network performance" (italics added).
Installation and Operation
The LAB test suite is shipped on seven 5.25-inch diskettes. The LAB installation process requires:
XCOPY of seven 5.25-inch master diskettes.
Creation of unique client names STATN1..STATN21.
Assignment of full access privileges for each client.
Configuration of drive assignments and mappings.
Creation of boot diskettes for drive A: of each client.
LAB employs Borland's Superkey keyboard macro program to drive the three applications. On each personal computer, a combination of DOS batch files and Superkey macros (hereafter referred to as scripts) execute each application's commands, call Norton's TIMEMARK.EXE to calculate the duration of the test, record the test duration on the A: drive, and perform light housekeeping (such as the deletion of the previous test's temporary files).
Each of the applications represents one test. Each of the LAB tests is run separately on the system under test with an increasing number of test clients, from 1 to 21.
To start the test, you must type the name of the test's batch file on each personal computer. For instance, running the Microsoft Word test consists of starting each participating client's Word script by hand, one every second or so. The test then runs to completion.
Test results are displayed on each client's monitor and also written to a DOS text file on each client's boot diskette. Results must be gathered manually for further analysis.
Test Procedures and Run Rules
The LAB test procedures revolve around multiple runs (at least three) at a variety of workload levels (presumably 1, 3, 6, 9, 12, 15, 18, 21).
Two informal run rules are included in the documentation.
LANQuest recommends testing the highest load level first (greatest number of workstations). Subsequent runs are performed, in turn, with three less clients, each logged out of the network and powered down.
To ensure consistency and reproducibility, LANQuest suggests that the tester calculate the mean test duration over three runs on the same system under test. They also suggest that the difference between means should be less than 4 percent.
Measurement Methods
The LAB scripts use Norton's TIMEMARK.EXE to measure test duration at each test client. TIMEMARK.EXE is executed at the beginning and ending of each script.
Measuring Server Disk Channel Performance. LANQuest uses the Lotus 1-2-3 and Word tests to measure "the server's ability to handle disk I/O." The 1-2-3 script is designed to write 20 files totalling 1,325KB to the disk. The Word script writes 50 files to the disk.
The relative performance of the disk channel is measured by using up to 21 test clients. During a 21-client test, the combined workload serviced by the server disk channel equals 525 Lotus spreadsheets (33MB) and 1050 Word documents (2MB).
Measuring Server CPU and RAM Performance. LANQuest says the dBASE test "is good for isolating processor speed and the efficiency with which RAM is accessed by the server." This, they say, is because the dBASE test has "fewer disk writes than the other two tests" and because "the entire database file is moved between server and each client three times." Due to dBASE III's use of 512-byte blocks, "a large number of cache memory reads are necessary to transfer the entire file."
Measuring Network Operating System Performance. LANQuest compares the results of the 1-2-3 and Word tests run on separate NOSs to provide "information on how effectively an operating system implements small versus large block transfers, an aspect of higher level protocols." LANQuest asserts that NOSs tuned for small block transfers will perform better with small files than with large files, and vice versa.
LANQuest uses the dBASE III test to provide an "indication of the operating system's caching capabilities." They highlight the dBASE test's file sort function as being "cache-intensive" and say the test "minimizes I/O transfer."
Workload Characterization
The following workload characterization was performed in the Novell Research Performance Lab in Provo, Utah. For each script, we include descriptions and measurements in categories:
Functional Layer
Logical Layer
Physical Layer, Quantity, and Pattern
The functional layer is a description of the operations or "functions" requested by the person using the application that is generating the workload. In this test workload analysis, the scripts used to drive the LANQuest tests provide us with that description.
The logical layer is a description of the stream of operations requested by the application to satisfy the requests made by the application user.
The physical layer is a description of the network traffic serviced by the network server. This description includes measurements for both major classifications of workload characteristics: quantity and pattern.
For more information on workload characterization, see "An Introduction to Workload Characterization" and "Workload Characterization of NetWare Clients and Servers Using a Protocol Analyzer" in the May 1991 and July 1991 NetWare Application Notes, respectively.
The Word Script
The Word script reads a 2176-byte document, renames it, saves it, reads in the renamed document, and then renames it again. This process is repeated 50 times. Since the Word test is driven by an automated script, the functional layer is easily described with the pseudo-code listing in Figure 2.
Figure 2: Pseudo-code for LANQuest Labs' Microsoft Word test.
Retrieve the 2KB test documenta Do 50 Times: Rename the test document Save the test document Clear the current document from application memory Retrieve the newly named document
Figure 3 is a listing of Word test's logical layer - the DOS function calls issued by Word during the test. This particular listing begins with an OPEN of document #9 and ends with the closure of what later becomes document #10.
Figure 3: Sample logical layer listing of LANQuest Labs' Microsoft Word test.
The function call detail shows how Word opens and reads a 2176-byte document (WRDTST9.DOC), writes it to a temporary file (MW172723.TMP), renames it (WRDTST10.DOC), then repeats the process. The extra steps in between these events are extra housekeeping tasks such as the access of NORMAL.STY.
The single client charted in Figure 4 generates an initial peak of 30% while loading the Word application. Utilization then stays steady at about 8% for the test's duration.
Figure 4: Ethernet line utilization for one Word client.
As test clients are added to the system, they culminate in an 80% utilization peak under a 20 client workload. The test then averages 38% utilization until the test tapers off at the end (see Figure 5).
Figure 5: Ethernet line utilization for twenty Word clients.
The information in Figure 6 can be used to compare LANQuest Lab's Word test workload to production workloads.
Figure 6: Summary of LANQuest Lab's Word test workload.
The Lotus 1-2-3 Script
The Lotus 1-2-3 script is similar to the Word script. The Lotus 1-2-3 script reads a 54KB spreadsheet, renames it, saves it, reads in the renamed spreadsheet, and then renames it again. This process is repeated 20 times. During a 15-second test, a single client writes 1MB of spreadsheet data to the server disk channel (see Figure 7).
Figure 7: Pseudo-code for the LANQuest Labs Lotus 1-2-3 test.
Retrieve the 54KB test spreadsheet Do 20 Times: Rename the test spreadsheet Save the test spreadsheet Clear the current worksheet in application memory Retrieve the newly named test spreadsheet
A listing of the Lotus 1-2-3 function calls was not available for this AppNote.
From Figure 8, you can see that one client produces a workload with an average Ethernet line utilization of 13% that lasts for approximately 13 seconds.
Figure 8: Ethernet line utilization for one Lotus 1-2-3 test client.
Although a 13% rate isn't uncommon for spreadsheet users, a 13 second duration is. This heavy workload, combined 20 times, creates the utilization plotted in Figure 9.
Figure 9: Ethernet line utilization for 20 Lotus 1-2-3 test clients.
With 20 LAB clients running the Lotus 1-2-3 scripts, the LAN channel utilization shown in Figure 9 averages 55%, with peaks ranging in the 70s to near 80%. These utilization figures are outside the recommended range for Ethernet production environments.
The initial jump in utilization is the loading of the Lotus application. Although delay created by this event is not considered in the formal measurement, many of the test's clients will begin measuring while other clients are still loading the application.
The information in Figure 10 can be used to compare LANQuest Lab's 1-2-3 test workload to production workloads.
Figure 10: Summary of LANQuest Lab's Lotus 1-2-3 test workload.
The dBASE III Script
The dBASE script does the following: (1) reads the entire database and displays the results of a single-key search; (2) sorts the database on one key and writes the results to a temporary file; then (3) repeats the initial search-and-display exercise on the newly created temporary file. This procedure is outlined in Figure 11.
Figure 11: Pseudo-code for LANQuest Labs' dBASE III test.
Display date, company, eps_rank, price_rec for eps_rank > 70> Sort to temporary file on date, company, eps_rank Use the new temporary file Display date, company, eps_rank, price_rec for eps_rank < 70
The dBASE III script initially opens the database file COMPANY.DBF and the index file COMPNAME.NDX. Figure 12 gives the logical layer listing.
Figure 12: Sample logical layer listing of LANQuest Labs' dBASE III test.
In this sample listing of DOS function calls, you can readily see one of dBASE III's inefficiencies. As dBASE III reads the index file, it requests the same 512-byte block nine times in a row in order to retrieve the nine records in that block. This same pattern of repetitive function calls is repeated numerous times as the application reads the entire index and database files.
Fortunately, this repeated access of each subsequent block is not characteristic of the script's network workload. In NetWare, the repeated requests for the same block are serviced by the local shell or DOS requester. Only the initial request for each 512-byte block traverses the network.
The single client utilization displayed in Figure 13 represents a very heavy workload averaging 20%.
Figure 13: Ethernet line utilization for one dBASE III test client.
The combination of 20 dBASE III clients creates a saturation condition on the Ethernet, as shown in Figure 14.
Figure 14: Ethernet line utilization for one dBASE III test client.
The information in Figure 15 can be used to compare LANQuest Lab's dBASE III test workload to production workloads.
Figure 15: Summary of LANQuest Lab's dBASE III test workload.
Analysis
Working from the details to design, we analyze the workload characteristics of LAB's test workloads and identify exactly what LANQuest is measuring. This analysis helps determine how well LANQuest met their design goals. It can also help us determine the usefulness of their results.
Workload Characteristics
Functional Layer. At this layer of analysis, you have to ask yourself if your users are going to do the kind of work emulated by each of the LANQuest Labs tests. This is a key question because test results can vary widely and the LANQuest results are only valid for production environments with similar peak workloads.
The Word and 1-2-3 scripts' concentration on the file creation process creates a workload that is possibly the furthest away from reality of any script using real applications. Their very design totally eliminates the use of server cache, when production environments, especially those of word processing and spreadsheet users, rely on server cache almost completely (90% of the time) to provide high-performance file access. Breaking server cache is not the right way to test the disk channel. It only tells you how well your server will run the test workload without a sufficiently designed cache.
The Word script creates 50 new files without any user delay. Our research of file-oriented application usage patterns is contrary to this design. Study of Novell customers demonstrated that delay is the most significant factor. We can't even imagine a word processing function that would require the creation of 50 files at one sitting! With their Word test, LANQuest ignores the wide variety of word processing features and functionality that should be included in a word processor-based benchmark, including:
Think time
Directory searches for existing files
Browsing of existing files
Formatting
Printing
Automated back-up
Editing
Spell checker and thesaurus usage
The Word script's concentration on a 2K document is also short-sighted. Our research turned up a wide variety of document sizes ranging from 4K to 100K. These don't even begin to approach the file sizes used for product manuals and other large document environments.
The 1-2-3 script's weaknesses are similar to those of the Word test: lack of realism and the complete vacuum of spreadsheet functionality that creates a wide variety of workload patterns generated by spreadsheet users.
The dBASE script scores better than the Word and 1-2-3 scripts on realism. The dBASE test includes two queries and a sort - a fairly realistic script for an individual network client in a record-oriented environment.
The weakness of the dBASE script is in LANQuest's trying to represent more that one user with it. Database environments where all users are performing back-to-back queries and sorts are the exception. Again, LANQuest ignores the all-important think time that occurs when users do work in between network requests. The script also concentrates on two database features and ignores the variety of database functions that we see in workgroups and departments that rely heavily on databases, such as accounting, human resources, customer service, and manufacturing.
Although the dBASE script is pseudo-realistic (without delay) for some users, mostly those who perform database maintenance and research, it neglects the majority of database users, including those performing data entry, batch entry, browsing, and update.
The combined weaknesses of the LAB scripts' design are enough to overwhelm any of their strengths. These weaknesses include:
No realistic delay
No realistic application usage patterns
Word and 1-2-3 concentrate solely on file creation
Word and 1-2-3 circumvent any natural use of server cache
dBASE performs only sequential database access
dBASE concentrates solely on report queries and sorts
These combined weaknesses prevent LAB from testing, measuring, exercising, stressing, or saturating network servers the way real production workloads do.
Logical Layer. The logical layer is provided for you to observe the strengths and weaknesses of the application as it translates user requests into system requests. Standalone applications aren't always tuned very well to the network's services. Sometimes they're just plain bad.
A good example is dBASE III. Although dBASE III still represents a good portion of the installed base, its use of network resources is poor at best. Its main weakness in this case (as shown in Figure 12) is its unfailing desire to request the same 512-byte block repeatedly while it reads the multiple records enclosed in the block. During the dBASE test, this weakness manifests itself when every database and index block is read nine times in a row.
There is also a tremendous amount of sequential I/O involved in the dBASE III test that isn't at all representative of mainstream database access patterns. Whether that is a function of dBASE, the small file size, or the design of the queries, we don't know. But the test's focus on sequential I/O, whether intentional or unintentional, makes the test a waste of time for real database users. Realistic DOS-based database usage patterns most commonly read or write an average 256-byte record, are almost entirely random (except during infrequent maintenance routines), and don't benefit from protocols that support sliding window or multi-packet transmissions.
Microsoft Word does a better job of network integration. If we had to pick something about Word to look into, it would be the static 128-byte read size. With technologies like Novell's packet burst, files less than 64K in size can be sent to the client in one burst. Larger files would also benefit from the 64K burst window size. When the entire file is requested, Word, by using 128-byte requests, isn't even taking advantage of the maximum packet size, let alone the benefits of packet burst.
A sample listing of Lotus 1-2-3 function calls for a logical level analysis was not available for this AppNote.
Physical Layer, Quantity, and Pattern. At this layer, we can compare the workload characteristics of the LANQuest benchmarks with those of real production workloads. We'll start with LAB's strengths.
One benefit gained by LANQuest's use of real applications is something LANQuest had no control over. This benefit is an accurate packet size distribution produced by all three LAB tests.
The packet size distribution is controlled by the application's request size and the underlying network protocols' sizing of those requests. If you use Word, or 1-2-3, or dBASE III, your packet size distribution will be the same as the distribution produced by the LAB test scripts if you perform the same functions with the same underlying protocols. This is good because a test's packet size distribution alone can make the difference between a valid test and an invalid test that produces useless results.
Another potential benefit gained by LANQuest's use of real applications is the record- or file-orientation of the application's workload. LANQuest didn't have any control over this benefit either, except in their choice of applications. As we've shown earlier, the dBASE III script doesn't produce what we would currently call a random access pattern. However, both 1-2-3 and Word look like real file-oriented applications doing file-oriented kinds of work. They're just doing a very non-word processor or non-spreadsheet type of work.
LAB's weaknesses at the physical level are largely due to the designer's scripts. These scripts were under LANQuest's complete control.
The Word script doesn't look anything like a word processor user on the wire. In fact, its machine gun style and concentration on the creation of 50 2K files has more similarities to a high-speed scanner in an imaging environment that it does a word processor user. All it needs is a larger file size and a periodic delay.
Generally, file-oriented workloads like those generated by word processors can be characterized by sequential access of small files (from 1KB to 100KB), low read-to-write ratios (from 4:1 to 8:1), and large amounts of delay (mostly think time) that are better measured in minutes rather than seconds or milliseconds.
The Word workload's read-to-write ratio is right where it ought to be. But that's not due to the script's design but due more to Word's small request size and access of auxiliary files necessary for document display and support.
The result of this Word workload, when it is multiplied up to 21 times, is an extremely heavy workload unlike any we've ever seen in a small-document word processing environment. Realistic or not, throwing 1050 files at the server without any delay is a test, but a test of what? If your users do this, then the LAB Word test is exactly what you're looking for.
In retrospect, LANQuest should know that an XCOPY script would have been easier to write, just as simplistic, and equally unrealistic.
Our synopsis of the Lotus 1-2-3 echoes that of the Word script. The 1-2-3 workload's machine gun style, mixed with its concentration on the creation of 20 spreadsheet files, makes it anything but a spreadsheet test workload.
The 1-2-3 workload's 1:1 read-to-write ratio only exacerbates this problem. This workload looks more like an XCOPY of a spreadsheet directory than anyone actually using a spreadsheet to do any real work. If you're looking for a test to see how many users can do XCOPYs of their spreadsheet directories, all at the same time, why not use XCOPY? It's a lot less expensive and much easier to use.
The dBASE workload is in vain from the very beginning due to dBASE III's ignorance of network services and sloppy record access scheme. This test shouldn't be used by anyone unless they're using dBASE III in production.
Although the dBASE workload approaches realism for a small minority of users in a department setting, it is an unrealistic representation of the work done by the majority. The thought that 21 of these workloads represents a production database environment of any kind is absurd. We don't know of any customer site where continuous sorts are a major part of each user's tasks, or even remotely representative of a peak workload in database environments.
Measurement
We now turn our attention to the more difficult task of determining what the LAB tests are actually measuring. We begin with the measurement itself. Then we move to the larger issue created by the unrealistic nature of the scripts and their workloads. Finally, we take a look at a glaring problem with the Word and 1-2-3 tests in the pre-NetWare 4.x environments.
Once you understand what a test is measuring, you can make an important distinction between the test results' usefulness or uselessness.
The Measurement Mechanism. Measurement in the LAB tests is performed by Norton's TIMEMARK.EXE. The output is each client's test duration reported to the nearest second. The only problem with LANQuest's timing mechanism isn't really the mechanism itself, but the poor accuracy of the report. In a 15-second test, 9/10 of a second is a big improvement. In the LAB tests, tenths of a second would be extremely helpful, since some tests will vacillate between two different results.
What Is LAB Measuring? The Word and 1-2-3 tests measure the speed at which a NOS can service back-to-back file creations as well as throughput to those newly created files in a poorly designed server (no server cache).
These measurements might have been useful, but LANQuest throws up serious barriers. First, there are very few LAN installations where back-to-back file creation is a frequent network event. These test workloads look more like those of imaging than word processing or spreadsheet users. To this narrow focus, LANQuest adds a high license fee, an unwieldy installation, and manual operation. If that weren't enough, the design of the workloads completely bypasses server cache - a critical ingredient in word processing and spreadsheet environments.
The dBASE test measures a system's performance during back-to-back queries and sorts, with small files that are always accessed sequentially.
Again, although this measurement may be exactly what someone is looking for, serious barriers exist to measuring anything useful with this test. If your production system will be handling 42 queries and 21 sorts, all at the same time, all using sequential access, then LAB's dBASE test may work for you.
Component or System Measurements? LANQuest points out early in their documentation that the problem with using component tests in the place of system tests is that their results, in their words "often mean little to the end user" because they don't measure the "interaction between variables." In other words, they're great for developers who want to hammer a NIC, for instance, but they won't tell you anything about how that NIC will perform in a production system. In fact, they can often be misleading. This is why we were disappointed when we took a closer look at the LAB test workloads.
All of the LAB tests break our first rule of system testing: "if you're saturating the hardware, you're not testing the software (or the system)" thus making each a component test.
LANQuest's use of these workloads to represent more than one client in a system test requires a real stretch of the imagination. As the client count is increased in each of the tests, the workload looks less and less like a production workload created by any number of users.
To test this assertion, run any one of the LAB tests on from one to five clients, measuring the Ethernet or Token-Ring utilization during each test. Compare your results with accepted load guidelines and you'll see that just a few LAB clients are not only outside any acceptable range of utilization but can also completely saturate a single segment or ring.
For example, Ethernet performs well if the average utilization is kept below 25% with peaks no higher than 70%. Due to LAB's total lack of delay, the LAB tests approach these thresholds unusually fast.
In order to get around the LAN saturation caused by LAB's overwhelming request rate - and preferably test the server - the server configuration would require at least four LAN segments or rings. Four ethernet segments, each with five test clients might do the trick. But the problem with this approach is LANQuest's misrepresentation of LAB as a system-level test in the press. Readers think they need four LAN adapters in all of their servers. To make matters worse, LAB demonstrates this necessity of four NICs with less than 21 clients.
Problems with the NetWare Shell. LANquest's test workload design has inadvertently isolated an unusual weakness in NetWare that has little to do with performance in production systems.
During each of the 20 Lotus script iterations (50 in the MS Word script), DOS issues a Create File request. The NetWare Shell, shipped with all NetWare versions prior to NetWare 4.0, assumes the file access mode to be shared and therefore does not cache the writes at the workstation. Microsoft LAN Manager and IBM LAN Server both assume the file to be nonshareable and cache the 52KB of writes. This translates into 50 network write requests for LAN Server and LAN Manager, and 500 network write requests for the same file in a NetWare environment - a disadvantage of nearly 10:1 to accomplish the same task.
Considering the low frequency of Create File requests in production environments, we don't believe that these test results point out an important weakness in NetWare. In any event, the v1.01 DOS Requester shipping with NetWare 4.0, and also compatible with 3.x environments, now assumes nonshareable access on created files and processes this benchmark situation much more efficiently.
Due to this 10:1 disadvantage, LAB tests performed with the NetWare Shell are really measuring NetWare's ability to write to newly created files. Tests run with Novell's DOS Requester return NetWare to the lead position in LAB test results with a one to two second margin over its competitors.
IBM's use of the previous results, in which LAN Server was shown to be 50% faster than NetWare, led customers to believe that LAN Server is generally 50% faster than NetWare. It's too bad they didn't have the rest of the story. Neither NetWare's performance, nor its margin of performance over LAN Server have changed, except when a newly created file is written sequentially. This 50% difference is one no one will notice - except, of course, those IBM customers who bought their NOS to run LANQuest's LAB tests 24 hours a day.
Installation and Operation
During our own installation and use of LAB we became frustrated with the required floppy swapping and LAB's use of each client's A: drive. So we condensed the diskettes to one 3.5-inch master diskette using PKZIP from PKWARE, Inc. We also altered the batch files to write the test results to a common network directory.
The manual startup routine also proved frustrating. In a 20-client test, the staggered start and finish of the test can represent 50% of the total test duration. Regardless of test duration, this staggered start - and staggered conclusion - goes against sensible testing practices. Since we were running on a network, we were able to develop a "quick and dirty" start routine that nearly eliminated the problem. The manual collection of data can be remedied with a little executable (choose your own weapon) to collect and analyze the data.
With these modifications, a full series of LAB tests can be run in an afternoon.
Design Goals
With the bulk of our analysis complete, we revisit LANQuest Lab's design goals.
1. Produce a system test workload rather than a component test workload.
This first goal was an important one for LANQuest to pursue. They know that component tests are valuable for engineering design and test environments, but fall short of any real usefulness in production system design and capacity planning. System tests, on the other hand, measure the performance of network systems under realistic conditions.
Sadly, LAB doesn't meet the requirements of a system test because it breaks all three of our laws of system benchmarking. LAB saturates the hardware (Law 1). In doing so, it identifies different bottlenecks than a production workload will encounter (Law 2), and it lacks realism (Law 3). So, although LANQuest began with good intentions, LAB falls short in substance.
This important differentiation between component and system test workloads is detailed in "Identifying Test Workloads in LAN Performance Evaluation Studies" in the May 1992 NetWare Application Notes.
2. Use a limited number of workstations to emulate a real-world network with many users.
Limiting the amount of resources required to run a test is important, but shouldn't be an overriding concern. Although it would be nice to represent 250 network clients with 25, we would rather see a test workload be accurate and produce useful results. If we had to choose between accuracy and ease of use, we'd choose accuracy. LANQuest, however, made the mistake of choosing ease of use over accuracy at some point in their design of the LAB scripts and methodology.
Our guess is that LANQuest began with a fairly realistic workload definition. But following several trial runs they realized that their test workload didn't produce a strong enough, or interesting enough, performance degradation curve. So they immediately rewrote their tests with the singular goal of degrading the system's performance.
Although LANQuest's initial intentions may have been good, they have created a test workload that is neither a system test nor a good component test. We visited the issue of system test, above. LAB doesn't make a good component test because it is inflexible, expensive, and tedious to install and operate.
LAB scripts run full-speed and therefore they use limited resources to saturate the system under test. However, LANQuest gave up any form of realism in their application scripts to achieve their goal of limiting resources. So, although real applications are used in the test suite, a closer analysis of the scripts reveals little resemblance to application usage patterns in production environments, either at light or peak levels.
3. Measure the relative performance of networks.
The term "relative" is often used in statements that compare the performance of one or more network systems or components with another. For instance, "the first system was relatively faster than the other systems we tested." This goal is often a cop-out. It means that the results of the test won't mean anything to you or me, but we can compare one result to another and try to infer something if we like.
If LANQuest's test workloads included more realism, or produced a workload that even remotely resembled the kind of workload real servers are servicing, then the term relative would have more meaning here.
As is, the LANQuest test results are really only telling you how well the LANQuest tests run on a specific system. If your users are going to run LANQuest tests 24 hours a day, then LANQuest's test results will help you find the right system. If your production workload isn't anything like the LANQuest test workload then you had better look for another test.
Recommended Uses and Abuses
In LANQuest's case, the LAB test suite does not take network performance measurements that are useful to the general public and serious abuses have occurred.
These abuses have resulted from LANQuest's continuing to portray the LAB test suite as system-level tests even though they failed to meet their own design criteria. Their use of off-the-shelf applications has allowed LANQuest to masquerade LAB as a system test and mislead many readers to believe that LAB's test results had a direct correlation to production environments.
LANQuest misleadingly recommends several uses of their LAB tests:
Measuring server disk channel performance
Measuring server CPU and RAM performance
Measuring network operating system performance
Using LAB for these purposes is invalid because it is not a system test workload.
In fact, the only thing LAB does very well is saturate the LAN channel. Its strength here, is produced by LAB's design around real applications, because the packet size distributions (PSD) are accurate for those applications. Accurate PSDs are often missing from published LAN test results.
But LAB's weaknesses as a LAN component test - high cost, inflexible file sizes, read-to-write ratios, and test durations - make it a non-contender.
If you choose to use LAB for LAN channel component testing, remember that LAB is prematurely and unnaturally saturating one or more LAN channel components. LAB has a voracious appetite for LAN channels. Four may not be enough. Exercising the LAN channel this way is valuable for NIC and NIC driver developers, but its results are meaningless to all those who are designing production systems.
We recommend that Novell customers ignore LANQuest's LAB test results because of LAB's over-simplification of LAN workloads.
We also emphasize that procurement, design, optimization, and capacity planning decisions should never be made based on the results of a single test. For more information concerning LAN testing methodology, refer to the following NetWare Application Notes:
"Identifying Test Workloads in LAN Performance Evaluation Studies" (May 92)
"Using Production Workload Characteristics to Validate LAN Performance Evaluation Studies" (Mar 93)
* Originally published in Novell AppNotes
Disclaimer
The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.