NetWare 4.0 Performance Tuning and Optimization: Part 1
Articles and Tips: article
Systems Research Department
01 May 1993
Performance tuning and optimization of a network operating system such as NetWare is a delicate subject. The design philosophy behind NetWare has always been to do most of the tuning internally, thus greatly reducing any manual effort required on the customer's part. This AppNote discusses the performance enhancements that have been made in NetWare 4.0. Part 2 will detail the operating system parameters that can be used to tune NetWare 4.0 servers.
Related AppNotes May 92 "Identifying Test Workloads in LAN Performance Evaluation Studies" Mar 93 "Using Production Workload Characteristics to Validate LAN Performance Evaluation Studies"
The Care and Feeding of a Special-Purpose OS
NetWare 4.0 is the latest in a long line of special-purpose network operating systems from Novell. Beginning with S-Net and NetWare 68 in the early 1980s, each successive release of NetWare has included significant additions of functionality, along with associated improvements in performance. The special-purpose nature of NetWare allows Novell engineers to fine tune the operating system specifically for the duties of a network server, while adding new features. Most of these new features are both configurable and removable, based on the needs of the target production environment.
NetWare has been able to embrace new technologies sooner than general-purpose operating systems such as OS/2 and UNIX. A good example is NetWare's ability to run 32-bit applications with the release of NetWare v3.0 in 1989. Another example is the file retention system that retains deleted files on unused disk space until that disk space is needed. Many of our customers have benefited from this life-saving feature, while others remove it either because they don't need it or they place other factors (such as improved performance) at a higher priority.
The fact that NetWare is a special-purpose operating system creates some unique opportunities for Novell's partners and customers. They can tailor network computing environments from the smallest office environment to the largest global networks, and for the simplest file services to the emerging fields of imaging, telephony, and multimedia. But in the process, a kind of balancing act is necessary to successfully improve system performance over the outstanding off-the-shelf performance of NetWare.
Balancing Performance with Reliability and Security
In any computing environment, even in a special-purpose operating system like NetWare, resources are limited. The speed and capacity of the hosting hardware platform determine in large part how much the operating system can do and when. Because of this inherent limitation, a great deal of balance is needed when making decisions about performance tuning and optimization.
For many people, performance is not the most important aspect of a network service. When forced to prioritize the different functions of a network server, most of Novell's customers will rank reliability and security as the two most important, with performance coming in a close third. Of course, each of these areas is important and no one likes to prioritize them or wants to exclude one for the other. But it's important that tuning for performance sake not be allowed to compromise reliability and security.
For instance, some NetWare features, such as read-after-write verification, can be performed at the software level or at the hardware level within the server. Moving this verification feature to the level that makes the most sense is wise. Performing the verification at both levels or completely removing the feature for performance sake could both be costly mistakes.
Auto-Tuning vs. Manual Tuning
Novell's performance tuning philosophy has always been to simplify the tuning process for our customers. With the exception of several parameters related to capacity, NetWare 3.x performs best with the settings right out of the box. NetWare 4.x continues this long tradition.
Auto-tuning is NetWare's ability to dynamically alter its configuration to accomodate changing or increasing workloads. Auto-tuning impacts four different OS parameters:
Physical Packet Receive Buffers
Cache Block Sizing
Manual tuning is difficult due to the lack of proper tools. It is very difficult to measure the impact of your manual tuning decisions without a repeatable and representative test workload to run against your old and new configurations. So Novell's products come already tuned for production environments and perform several auto-tuning processes during daily operation to maintain optimal performance levels.
The Problem with Benchmarks
Software vendors and customers alike tend to rely heavily on industry benchmarks when making decisions about tuning and optimization. The problem with current benchmarks is that the majority of them are actually component tests rather than system tests. Their results reflect the performance of the tested component under unrealistic conditions. They reveal nothing about the component's performance as a member of a system servicing a production workload.
For example, network interface cards (NICs) are often tested with their maximum packet size for sustained periods of time. The results from these kind of tests show how close the NIC comes to its theoretical maximum throughput. But in real LAN environments, maximum packet sizes are rarely requested, and when they are, it is sporadic and only for brief durations. While there are notable exceptions, the point is that these kinds of test results don't represent the performance of the NIC in a production environment. What's more, testing at those sustained levels exaggerates differences between NICs that often are not important to the mainstream production environment.
If server operating systems were professional basketball players, some testers would be happy just to look at the player's free throw statistics and make their selections based on who could successfully make 50 shots in a row. But we know better. Free-throw statistics alone do not represent all of the ingredients that make great basketball players.
In using component tests and their results, vendors mistakenly tune their systems to the test's unrealistic traffic patterns. What's worse, customers believe that the exaggerated differences between tested components truly represent the way the winning component will operate in their production environment and base their buying decisions on that.
The solution to these problems lies in the creation of system test workloads that correctly emulate workload patterns found in production environments. Novell's Ghardenstone methodology provides a better means of gauging the performance of network servers and operating systems. Two other organizations, the Business Application Performance Corporation (BAPC) and the Standard Performance Evaluation Corporation (SPEC), are also working towards more realistic test workloads.
We recommend that benchmarking be done carefully, and that more than one test be used before conclusions are drawn. Quite often, testers at all levels of our industry have thought they were testing one thing, when they were actually testing another. We can't stress enough the importance of knowing what you're testing and determining how that corresponds to your own production environment. These are critical first steps before you apply any test results to expensive procurement and capacity planning projects.
NetWare 4.0 Performance Enhancements
Performance has always been an utmost concern in the design of NetWare. The NetWare 4.0 operating system features numerous enhancements aimed at providing performance that is equal to or better than its predecessors. These include:
Optimized code paths
Disk block default size
Prioritization of disk requests
These enhancements are explained in the following sections.
Optimized Code Paths
With the leaps and bounds in microprocessor performance, it is becoming more difficult to measure software performance improvements. In some cases, better processors just make inefficient code look better. In the case of NetWare 4.0, we have continued to streamline and tune important core code paths because they are a foundation piece of the operating system.
NetWare's core code paths include:
Virtually every version of NetWare has been faster than its predecessor in the core code paths. Since more efficient code paths reduce the CPU's workload, we measured the improvements made from 3.11 to 4.0 by running an identical test workload against both operating systems and measuring the utilization of the CPU.
Figure 1 compares the results for NetWare 3.11 and 4.0. Note that in both cases, CPU utilization increases with the incremental addition of clients.
Figure 1: Comparison of NetWare 3.11 and 4.0 CPU utilization statistics.
It is interesting to note NetWare 4.0's initial increase in CPU consumption. This is due to the overhead of NetWare 4.0's additional services running in the background. Under light loads, NetWare 4.0 spends more time servicing these additional background tasks. As the workload increases, the overhead from these services disappears due to their lower prioritization. At this point, the improvements in the core code paths begin to become apparent. Eventually, NetWare 4.0 outpaces 3.11 in terms of overall throughput at reduced levels of CPU utilization.
Figure 2 shows the same types of results for LAN channel throughput. These results clearly demonstrate an improved rate of throughput in NetWare 4.0, along with a decreased rate of CPU utilization under heavier workloads.
Figure 2: Comparison of NetWare 3.11 and 4.0 throughput statistics.
These significant improvements to NetWare's core code paths make it possible for NetWare to continue to support important value-added network services as well as emerging technologies, while maintaining NetWare's reputation of high-performance. This is a primary benefit of a special-purpose operating system compared to general-purpose operating systems.
Disk Block Size
In all versions of NetWare prior to NetWare 3.x, the disk block allocation unit was fixed at 4KB. In NetWare 3.x larger allocation units were possible, but they resulted in a lot of unused disk space at the end of files. For example, if the last part of a file took up 1KB of a 16KB block, the remaining 15KB was wasted.
In an office automation environment where small files make up the majority of data, this wasted disk space was a potential problem. In database environments where larger files are predominant, larger disk allocation units were less of a problem and even helped in some cases.
With the release of NetWare 4.0, the long tradition of 4KB default disk allocation units comes to an end. In most cases, NetWare 4.0 defaults to a block size other than 4KB during the creation of each volume. Default allocation sizes during volume creation are not based on performance criteria, but on volume size to conserve server cache memory. The new defaults are listed in Figure 3.
Figure 3: NetWare 4.0's default disk block sizes are based on volume size, not on performance criteria.
Default Block Size
Less than 32 MB
32 to 150 MB
150 to 500 MB
500 to 2000 MB
2000 MB and up
Based on our performance testing, we recommend a 64KB block size for all volumes. The larger 64KB allocation unit allows NetWare to use the disk channel more efficiently by reading and writing more data at once. This results in faster access to mass storage devices and improved response times for network users.
Suballocation is implemented in NetWare 4.0 to overcome the problem of wasted disk space due to under-allocated disk blocks (as described above). Suballocation allows multiple file endings to share a disk block. The unit of allocation within a suballocated block is a single sector (512 bytes). That means that as many as 128 file ends can occupy one 64KB block. Using suballocation, the maximum loss of data space per file is 511 bytes. This would occur when a file had one more byte than could be allocated to a full 512-byte sector. Hence, suballocation nearly eliminates the penalty of using larger disk allocation units and allows much larger disk channel transactions.
From a performance standpoint, suballocation enhances the performance of write operations within the OS by allowing the ends of multiple files to be consolidated within a single write operation. Of course, this minor improvement will often be counterbalanced by the increased overhead of managing the suballocation process. The major win is the optimization of the disk channel and cache around the 64KB disk allocation unit.
As imaging, multimedia, and other workloads involving streaming data become more prevalent, the 64KB block size will become invaluable. We recommend that everyone use the 64KB disk block size for greater efficiency, elimination of wasted space, and to take advantage of read-ahead (explained next).
Read-ahead is made possible, in part, by NetWare 4.0's larger disk block allocation units. NetWare includes heuristics that determine whether a file is being accessed randomly or sequentially. If the file is being accessed randomly, read-ahead is not engaged for two reasons: (1) the likelihood of a sequential access is remote; and (2) it could create unwanted overhead. But if the file is being accessed sequentially, NetWare assumes the sequential mode of access will continue and begins to stage additional data (move it into cache) for the user.
With larger disk allocation units, these reads can often be processed during a single rotation of the disk. This is a significant improvement when you consider that previous versions of NetWare would issue 16 individual 4KB disk requests for the same amount of data.
Here's how read-ahead works. With each file open request, NetWare begins tracking the file access patterns. Once NetWare determines that the requester is accessing the file sequentially, the "read-ahead" process is engaged. As soon as the requester accesses data at the current block's midpoint, NetWare issues a request for the next block (see Figure 4).
Figure 4: For files read sequentially, read-ahead kicks in as soon as data is accessed at the current block's midpoint. read-ahead kicks in as soon as data
If the requester's mode of access changes from sequential to random at any time, read-ahead is turned off automatically.
Read-ahead requests are placed in the disk queue at a lower priority than normal reads and writes. Because read-ahead requests are processed in the background, NetWare can provide the benefits of read-ahead without a significant toll on foreground reads and writes. The ratio of processed read-ahead requests to normal reads and writes is lowered under peak conditions and raised as server resources become available.
If the server is servicing a heavy workload, the read-ahead may not occur immediately. But in production environments where the workload is highly random, there is a high probability that the data will be cached before the client actually requests it. In the case that the Read-Ahead is not processed at the lower priority and the client does request that data, the read-ahead request will automatically be bumped up to the higher priority of normal reads.
Prioritization of Disk Requests
In past versions of NetWare, we saw a few cases where a server servicing an abnormally heavy write condition would seem to ignore transactions involving read requests. This happened when a large number of cached writes hit a threshold where the server needed to switch writes from background to foreground in order to flush cache and recover from the abnormal load. But when the server switched priorities, it switched completely to foreground writes, without any consideration for the higher priority read requests. I term this an "abnormal" condition because it is only seen in benchmark tests with absurdly low read-to-write ratios. It is seen very seldom, if ever, in production environments.
NetWare 4.0 includes a tiered prioritization of the disk elevator that reduces the possibility of ignored reads. It also supports lower priorities for read-ahead requests. There are four bins, prioritized as follows:
Critical events (such as file commits and TTS log file writes)
Critical events are typically guaranteed events, so these are always processed with greater priority. Reads are almost always generated by client foreground tasks and make up the majority of work processed by any server. Most writes can occur in the background as a write-behind process. Read-ahead requests are prioritized so as not to preclude the processing of any higher priority events.
Instead of using the normal first-in, first-out (FIFO) build sequence for the disk elevators, NetWare 4.0 takes a percentage of requests from each priority bin. The higher the priority, the more requests get placed on the current elevator. In this way, none of the levels gets locked out due to an overabundance of requests in one of the levels.
In this AppNote, we have described NetWare 4.0's performance enhancements and made a few recommendations for optimizing the server in light of these enhancements. Part 2 of this AppNote will detail the operating system parameters 4.0 that can be used to tune NetWare 4.0 servers.
* Originally published in Novell AppNotes
The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.