Optimizing IntranetWare Server Memory

Articles and Tips: article

RON LEE
Senior Research Engineer
Novell Developer Information

01 Mar 1997

Thanks to Craig Teerlink, Rick Johnson, and Ron Alder for their help with this AppNote.

The lack of adequate memory resources can significantly affect the performance of your NetWare 4 or IntranetWare server. This AppNote walks you through the process of calculating server memory requirements and tuning the cache to optimize your server for the workload your users are actually generating.

Introduction
The IntranetWare Server Memory Architecture
Mythical Memory Calculation Rules of Thumb
Estimating IntranetWare Memory Requirements
Tuning the Memory Estimate for Your Production Environment
Strategies for Tuning File System Cache
Monitoring IntranetWare Memory Allocation
Conclusion

Introduction

The accuracy of server memory calculations has become more important in the last few years. Growing disk capacities and user communities, along with sophisticated server applications and the platforms that host them, are placing more and more emphasis on server memory resources. To help you keep pace in the midst of these changes, Novell has created a new memory optimization procedure to help you identify the best memory configuration for any IntranetWare server.

This AppNote begins with a conceptual section to help you understand the memory architecture of IntranetWare and its subsystems, including a complete discussion on memory fragmentation. The memory optimization process itself is divided into the following sections:

A method to estimate the amount of memory required for an IntranetWare server
Ways to monitor IntranetWare's allocation of memory resources
Several strategies for tuning directory and file cache once the server is operating in a production environment

This AppNote primarily covers the NetWare 4.11 operating system, the platform upon which IntranetWare 1 is built. For the purposes of this AppNote, we'll refer to NetWare 2, NetWare 3, NetWare 4.1, NetWare 4.11 and IntranetWare 1 when specific references to those versions are necessary. Otherwise, we'll use the term IntranetWare to denote NetWare 4.11 - the server component of IntranetWare 1.

The information in this AppNote supercedes all previously published server memory calculation and cache tuning guidelines.

The IntranetWare Server Memory Architecture

If your responsibilities include the recommendation or management of IntranetWare server configurations, you need to have a basic understanding of the server's memory architecture. This information will not only help you design high-performance servers, but it will also help you interpret MONITOR statistics and IntranetWare error messages.

For IntranetWare, Novell redesigned the server memory architecture to enhance performance and memory protection, and to reduce memory fragmentation. Three major changes were made from the NetWare 3 memory architecture:

A memory organization that eliminates some of the NetWare 3 memory pools (Permanent, Semi-Permanent, and Alloc Short-Term) to reduce memory fragmentation
A memory map that allows IntranetWare to write-protect operating system and NetWare Loadable Module (NLM) code space
A new memory allocation scheme designed to be more in tune with production server behaviors

Several more recent improvements were made as part of IntranetWare:

New garbage collection algorithms that perform two orders of magnitude faster than in previous systems
Changes to the memory management system to handle large memory configurations up to 3.12GB of RAM

Memory Organization

IntranetWare uses a memory organization that divides the server's memory, beginning at address 0 to the end of physical memory, into useable units called "pages." In Intel-based systems, pages measure 4 KB in size; on other hardware platforms, page sizes can vary between 512 bytes to 16 KB. In each case, IntranetWare uses the host CPU's native page size.

Note: IntranetWare's paging organization should not be confused with virtual memory, in which unused memory pages are swapped to disk.

After the operating system is initialized, all remaining memory is given to the caching subsystem. Then, as NLMs are loaded and request memory resources, IntranetWare takes the requested memory from cache in page-sized increments. When NLMs exit or discard unneeded memory, those memory pages are returned to cache (see Figure 1).

Figure 1: IntranetWare allocates memory in pages.

This new organization overcomes several shortcomings in previous versions of NetWare. For instance, in NetWare 3 some memory pools permanently allocated memory that could never be returned for reuse by other NLMs. In IntranetWare, all memory allocated by NLMs is completely returnable and reusable, which greatly reduces memory fragmentation. Because the file system cache uses all memory not allocated by the OS and related NLMs, none of the server's memory goes unused.

In NetWare 3, memory pages allocated by NLMs were split up and shared by multiple NLMs. This interleaved organization made it difficult to reuse returned memory pages until all NLMs were finished with each shared page. To avoid this problem in IntranetWare, pages allocated by an NLM are not shared with other NLMs. This new organization of allocated memory allows IntranetWare's garbage collector to return all of an NLM's pages when the when the NLM is unloaded.

The IntranetWare Memory Map

NetWare uses a CPU facility called mapping that allows physical memory to be reorganized into a more efficient order. This mapping process results in two or more views of memory:

The actual memory, called physical memory
The new view(s) of memory, called logical memory

The data structure that provides translation between these two views is called a memory map. IntranetWare uses memory maps to reduce memory fragmentation by assembling non-contiguous blocks of physical memory into contiguous blocks of logical memory.

Figure 2 is a visual representation of how IntranetWare organizes server memory.

Figure 2: IntranetWare uses memory mapping to use physical memory more efficiently.

IntranetWare loads its operating system code high in logical memory. All NLM code is also loaded into logical memory, while the NLMs' data regions are mapped down in the physical memory area. With all of the code mapped high, IntranetWare takes advantage of memory write- protection services provided by the CPU to protect the OS and NLM code pools from being overwritten. Write requests into this protected memory space generate page faults so that undisciplined NLMs can be identified and removed from the server.

Memory Allocation

IntranetWare's Alloc subsystem manages free memory made up of memory fragments left over from previous allocation processes and memory that has been deallocated by NLMs. This process begins when memory is first extracted from the cache subsystem to satisfy an NLM's Alloc request. Depending on the size of the NLM's request, this memory is divided up into blocks as small as 16 bytes or very large blocks that span many 4KB pages. Memory left over from these reqests is placed on a free memory list.

Note: In discussions of IntranetWare memory internals and allocated units of memory, the term node is often used instead of block. Although both terms are appropriate for different discussions, we use "block" in this AppNote because it has direct reference to statistics displayed in the MONITOR utility.

NetWare 3 managed free memory with one linked list made up of memory blocks in sizes ranging from very small (8 bytes) to very large (greater than 4 KB). After a period of time, this free list could become quite large, with hundreds of small pieces of memory included at the beginning of the list. Each time an NLM requested a block of memory, the Alloc subsystem would traverse this free list looking for the first fit. Ideally, this allocation process should take less than 100 instructions, but due to the size of the free list it was often requiring thousands of instructions to find a memory block that best suited the NLM's Alloc request. This same process was repeated each time memory was allocated from or returned to the free list.

To overcome the inherent latency involved in traversing large linked lists, IntranetWare manages 77 separate free lists for each NLM. These free lists are based on memory block size, and include:

An array of 64 free lists in 16-byte increments for memory blocks up to 1 KB in size
An array of 12 free lists in 256-byte increments for memory blocks from 1 KB to 4 KB in size
A single free list for memory blocks larger than 4 KB in size

This private allocation scheme can have a significant impact on memory performance in several ways:

First, the OS is now able to anticipate memory requests and prepare memory in advance. When an NLM allocates memory, the OS rounds the request up to the nearest 4 KB increment and then doubles that amount before taking the requested memory resources from cache. This chunk of cache memory is then split up to satisfy the NLM's request. The remaining memory produced from doubling the request size is placed in one of the NLM's free lists in anticipation of similar requests of the same size. These extra memory blocks can have a significant impact on performance because requests serviced from the NLM's free lists are much less expensive in terms of CPU time than those serviced from the server's cache subsystem.
Second, this design is supported by our research findings that individual NLMs favor specific sizes of memory blocks. For example, one NLM may request memory in 32-byte and 2 KB blocks, while another may request only 4 KB blocks. This private allocation scheme provides each NLM with its own collection of right-sized blocks.

Together, these two benefits create a high-performance environment for NLM memory operations.

Understanding Memory Fragmentation

When new volumes or services can't allocate enough memory to load or operate, it often means that the server is simply not configured with enough RAM to accomodate the new services. However, when you know your server has enough RAM for a new service and it is still unable to allocate the needed resources, memory fragmentation could be the cause. This section presents some background information that will help you understand the underlying causes and several solutions for memory fragmentation in NetWare 3, NetWare 4, and IntranetWare.

All operating systems struggle with memory fragmentation issues, and each has its own methods of dealing with the problem. Fragmentation naturally occurs when available memory is divided into smaller pieces and portioned out to multiple OS subsystems and applications. The impact of this resource sharing is the waste that occurs when some memory becomes unusable. This waste has two classical definitions, termed external and internal fragmentation.

External Fragmentation. The NetWare 2 and 3 operating systems worked solely out of physical memory and didn't use memory mapping. This meant they had no way to reorganize noncontiguous blocks of physical memory into contiguous blocks of logical memory. To illustrate, consider the scenario shown in Figure 3.

Figure 3: External fragmentation in physical memory.

In Figure 3a, a contiguous block of memory is divided up and allocated by processes A, B and C. When process B frees its memory (Figure 3b), the memory becomes available for others to use. But when C requests a single memory block the same size as the two blocks of memory released by B (Figure 31c), the memory is unavailable because it is noncontiguous. This is called external fragmentation because memory blocks external to the request create gaps that can't be collapsed and used as contiguous blocks of memory.

In NetWare 2 and 3, memory problems were often produced by external fragmentation. The server had plenty of memory but, over time, external fragmentation reduced the possibility of satisfying requests for large contiguous blocks of memory.

Internal Fragmentation. NetWare 4 and IntranetWare use a mapped memory architecture, which has removed many of the limitations of physical memory--including some related to external fragmentation. However, even a mapped memory architecture produces fragmentation as a byproduct of normal allocation processes, as illustrated in Figure 4.

Figure 4: Internal fragmentation in mapped memory.

In this example, when process A requests 5 KB, the OS uses two 4 KB blocks to satisfy the request. The 3 KB of leftover memory won't be used unless process A requests a memory block less than or equal to the size of the leftover memory block. This wasted memory is said to be caused by internal fragmentation because the wasted memory is internal to the memory allocated by the requesting process. However, when process A frees the 5 KB memory block, the entire 8 KB is returned to cache. Thus internal fragmentation is much more manageable than the irreversible external fragmentation described earlier.

Memory Fragmentation in NetWare 3

Because NetWare 3 does not use any memory mapping, the OS, its sub systems, and all NLMs contribute to external memory fragmentation. In most NetWare 3 server configurations, this isn't a noticeable problem. But in some configurations that include large volumes, removable volumes, and NLMs that rely heavily on NetWare 3's Alloc subsystem, external fragmentation can pose a serious problem. In some cases, the server must be reinitialized to eliminate the effects of fragmentation.

If this is the case with your NetWare 3 servers, we suggest that you upgrade to IntranetWare as soon as possible because the causes and effects of external fragmentation have been greatly minimized in IntranetWare.

Memory Fragmentation in NetWare 4 and IntranetWare

Due to changes in the NetWare 4 and IntranetWare operating system specifically, the mapped memory architecture much of the fragmentation that occurred in NetWare 2 and 3 servers has been eliminated. The IntranetWare operating system code and data pools are all mapped into logical memory, which eliminates most external fragmentation scenarios. In an upcoming release of IntranetWare, all of the server's software components will use logical memory to avoid the effects of external memory fragmentation entirely.

In NetWare 4.0x, NetWare 4.1, and IntranetWare 1, there are two exceptions where logical memory is not used:

DMA-based NLMs that do not handle physical-to-logical memory addressing
The Moveable Memory subsystem that is confined to physical memory

DMA-based NLMs. Data for DMA-based NLMs is still placed in physical memory and must deal with the effects of external fragmentation. Most NLM data could now be loaded high, but some I/O-related drivers are slowing down the migration to logical memory. Older adapters, many of which were designed for the AT bus, have a 24-bit address path, which means they can't address anything above 16MB. That by itself doesn't keep IntranetWare from loading the adapter's shared memory high. The problem is that the driver isn't designed to handle physical-to-logical address translations during DMA operations.

In IntranetWare 3, these last vestiges of external memory fragmentation will be eliminated by switching all NLM data to logical memory.

Moveable Memory Subsystem. This subsystem allocates memory for a variety of OS tables, including file allocation tables (FATs), directory hash tables, and connection tables. Up to and including IntranetWare 1, the moveable memory subsystem was confined to the high end of physical memory. Residency in physical memory and the related effects of external fragmentation have been a frustration for users with large volumes, especially removable volumes. After mounting and unmounting such a volume several times, the resulting fragmentation of physical memory made mounting the volume virtually impossible.

To solve this problem, Novell ships HIMOVE.NLM with NetWare 4.1 and IntranetWare 1. It is located on the Master CD in the following directory: NetWare 4.1: \NW410\BOOT\NWOS2\ IntranetWare 1: \PRODUCTS\NW411\__\411OS2\BOOTOS2

When HIMOVE is loaded, it allows the moveable memory subsystem to use logical memory and avoid external fragmentation.

For the present, HIMOVE shouldn't be used with device drivers that use DMA or with those that don't do proper physical-to-logical address translations, as described above. If you're not sure whether your disk driver falls into this category, you can use the following test. Using a blank hard disk, perform these steps at the server console:

Load HIMOVE.NLM.
Load the disk driver.
Create a NetWare partition and volume.
Mount the volume.
Dismount the volume.
Mount the volume again.

If the driver isn't properly handling physical-to-logical addressing, the volume's FAT will be corrupted during Step 5 and the volume won't mount in Step 6.

In subsequent versions of IntranetWare, HIMOVE's functionality will be integrated into the OS. The Moveable Memory subsystem will then use logical memory and no longer deal with the effects of external memory fragmentation.

Other Memory Limitations Not Caused by Fragmentation

Some memory limitations aren't caused by fragmentation at all. Here are three cases:

The most obvious non-fragmentation problem is simply not having enough RAM in the server. This problem can be resolved by estimating the server's memory requirements with the IntranetWare Memory Calculation Worksheet included with this AppNote, and then tuning the cache as described later in the AppNote.
Another problem was confined to NetWare 4.0x. Theoretically, NetWare is capable of using the full address space supplied by the CPU--4 GB in the case of the 386, 486, and Pentium CPUs. But NetWare 4.0x allocated memory for a cache control structure that was only large enough to handle 480 MB. Errors seen with more than 480 MB weren't caused by memory fragmentation, but by an insufficiently sized OS data structure. The NetWare 4.1 and IntranetWare cache control structures are capable of handling up to 3.12GB of memory in the server.
Another memory problem is created by certain EISA-based servers that don't automatically register memory above 16 MB. If you have more than 16 MB in one of these machines and run into memory errors, see How to Register Memory in NetWare 3.x and 4.x in the Novell Support Connection for step-by-step instructions. (The Novell Support Connection is available on CD-ROM or on the World Wide Web at http://support.novell.com.)

Garbage Collection

IntranetWare has a "garbage collector" that periodically collects memory no longer needed by NLMs and returns the memory to cache. The garbage collector usually runs in the background, but you can manually initiate the process if the need arises.

Background Operation. The garbage collection process is placed on IntranetWare's run queue as a lightweight (work-to-do) thread when either of the following conditions is met:

Condition 1. If the "Garbage Collection Interval" has arrived, IntranetWare checks the "Minimum Free Memory For Garbage Collection" setting before queuing the garbage collector. The interval involved in this condition acts as a regular reminder to check for sufficient memory to make garbage collection necessary.
Condition 2. If the number of free operations is greater than the "Number of Frees For Garbage Collection," IntranetWare checks the "Minimum Free Memory For Garbage Collection" setting before queuing the garbage collector. A counter that tracks free operations acts as a trigger in this condition when significant memory operations may necessitate garbage collection before the Garbage Collection Interval has arrived.

Under either of these conditions, the garbage collector is queued to run as a background process without any interface to the user (see Figure 5).

Figure 5: Background garbage collection.

The syntax for each of the IntranetWare SET parameters mentioned above is as follows:

SET Parameter (with Default Setting)	Range
SET Garbage Collection Interval=15	1 to 60 (minutes)
SET Minimum Free Memory For Garbage Collection=8,000	1,000 to 1,000,000 (bytes)
SET Number of Frees For Garbage Collection=5000	100 to 100,000 (operations)

Immediate Operation. The garbage collector can also be manually spawned using MONITOR.NLM. To do this in IntranetWare, load MONITOR and select the Memory Utilization option from the main menu. At the bottom of the screen is an "F3=Free system memory" option to perform the garbage collection process on the entire system (see Figure 6).

Figure 6: MONITOR.NLM's system-wide garbage collection option screen.

When you select an NLM on this screen, an "F3=Free module memory" option appears that allows you to perform a garbage collection process on the selected module (see Figure 7).

Figure 7: MONITOR.NLM's module garbage collection option screen.

Note: In NetWare 4.1, both garbage collection options-F3 for free module memory and F5 for free system memory-appear after you have selected a system module.

The garbage collector runs synchronously in the foreground when you initiate the process via MONITOR. The garbage collection process runs immediately and the results of the memory reallocation, if any, are reported as updates to the free memory displayed on the same MONITOR screen.

This immediate garbage collection option may be of most interest to developers who want to monitor the amount of memory freed from their NLM at any given time. However, a server administrator may also use this option to measure the benefits of running the garbage collector more often. For instance, if the SET Garbage Collection Interval parameter is set to its default of 15 minutes, the administrator could manually initiate garbage collection every 5 minutes and then gauge the impact by watching MONITOR's Memory Utilization statistics.

Garbage Collection Performance Considerations. Over time, as the average size of server memory has increased, the garbage collection algorithms used by the NetWare operating system have required some fine tuning. IntranetWare uses a new series of algorithms that scale better in large memory configurations. As a result, the garbage collection process is faster and more efficient. For example, in an IntranetWare server with 16MB of memory, garbage collection was tested at 20 times the speed of prior systems. In a server with 148MB of memory, test results showed performance gains of up to two orders of magnitude greater then previous versions of the algorithms.

Memory Architecture Summary

The IntranetWare memory architecture has matured into a collection of high-performance memory subsystems that are both scalable and tailored to the unique requirements of resident NLMs. This architecture and its benefits are largely due to IntranetWare being a special-purpose OS designed to support special-purpose network applications and services.

Mythical Memory Calculation Rules of Thumb

Over the years we've watched several inaccurate memory calculation methods come and go. Somehow, two erroneous "rules of thumb" have persisted through the ongoing evolution of server technology. We expose these two "mythical" rules here in an effort to eliminate them from the network consulting craft, particularly for IntranetWare servers.

Myth: MONITOR's Cache Buffer Percentage Should Be At Least 50%

This mythical rule suggests that the Cache Buffers percentage (found in MONITOR by selecting the Resource Utilization option and looking in the Server Memory Statistics window) should be at least 50% in order for the server to have enough cache to operate successfully.

Although this recommendation may be correct for some servers, it is not useful in determining whether an IntranetWare server is properly configured. For every server that this recommendation fits, we can point to a server for which 50% cache would be too high or too low. This is because servers with few users function well with less cache, whereas servers with large numbers of users require a disproportionately greater amount of cache to provide high performance services that rely on cache. In tbe real world, one server at 80% and another at 20% may both have an appropriate amount of cache.

This mythical rule of thumb should be replaced by the cache tuning strategies based on the LRU Sitting Time statistic, as described later in this AppNote.

Myth: Allocate One MB of RAM per 16MB of Storage

In the days of NetWare 2 and 3, when servers were limited to 16MB of RAM, memory calculations published by Novell combined the disk and cache requirements for the server into one calculation. When the 16MB limitation was removed and the densities on magnetic media began to increase, Novell's server memory calculations fell behind. It's not surprising that this 1MB-per-16MB-of-disk rule of thumb saw its beginnings during this period of time.

This mythical rule suffers from the same weakness as the Cache Buffers percentage myth. It may work for some servers, but it is woefully inadequate for most IntranetWare servers.

This rule should be replaced by the calculations outlined in the IntranetWare Memory Worksheet, in which you calculate the exact memory requirements for the server's disk subsystem and then estimate the amount of cache separately. This method has proven to be much more accurate for the wide variety of server configurations currently being implemented.

Estimating IntranetWare Memory Requirements

The first step in determining the correct amount of memory for a new or existing IntranetWare server is to estimate its memory requirements. This estimation gives you a fairly accurate ballpark figure to work from. Later, after the server is placed in a production environment and has adjusted itself to the production workload, you can tune the server to the exact requirements the actual workload is placing on the server, as explained in this AppNote.

Note: In estimating memory requirements for NetWare 4.x and IntranetWare servers, avoid relying on past calculation methods and "rules of thumb" that may give inaccurate results. (See the sidebar "Mythical Memory Calculation Rules of Thumb" for more information.)

The IntranetWare Memory Requirements Worksheet

The worksheet included with this AppNote requires you to do some homework before getting started with the calculations. Specifically, you'll need to gather the following information about your server:

Total disk capacity
Total useable disk capacity
Volume block size
Total number of clients
Estimated total number of files

Total Disk Capacity. This first variable (V1) is the total number of megabytes attached to your server (use 1024 MB for each gigabyte). This number makes allowances for the Media Manager subsystem that manages the server's physical storage devices.

Total UseableDisk Capacity. If your disk storage subsystem will be duplexed or mirrored, this second variable (V2) is half of the total disk capacity above. Otherwise, the two numbers are equal. You'll use this number to calculate the exact amount of memory required by IntranetWare's low-level file system, including file allocation tables (FATs) and directory hash tables.

Volume Block Size. This variable (V3) is the block size used during the installation of your NetWare volumes. The accuracy of this variable is important because volumes with 4KB blocks require 16 times the amount of memory required by volumes with 64KB blocks. It will also be used in calculating the number of disk blocks per megabyte (V4) and the total number of disk blocks (V5).

Total Number of Clients. This variable (V6) is the total number of end- users or connections that will be simultaneously using the server. This number determines the amount of file system cache the server will need to cache end-user requests for repeated use.

Estimated Total Number of Files. This variable (V7) is your estimate of the total number of files that will reside on the server. A ballpark estimate will suffice because the directory tables only require 6 bytes of memory per file. If you're using block suballocation, this requirement increases to 11 bytes per file.

Determining the exact number of files a server will ultimately store is an impossible task. But there is a way to come up with a fairly accurate estimate based on your current store of data and your average file size. You can calculate your average file size if you know the total number of bytes stored on your server and the total number of files. (Many server backup tools conclude their tasks by reporting the total number of files backed up, as well as the total number of bytes.) From this information, you can derive your average file size and then the total maximum number of files by performing this calculation:

a. Enter total current bytes stored on your server
b. Enter total current number of files on your server
c. Calculate average file size ( a . b )
d. Enter Total Useable Disk Capacity in MB (V2 on the worksheet)
e. Convert d to bytes ( d - 1,048,576 )
f. Calculate total maximum number of files ( e . c )Enter this number for V7 on the worksheet

By using this method to estimate the maximum number of files for your server, you can be more confident that your final memory calculation really will be representative of a server in full production.

Entering Information on the Worksheet

Once you have gathered all of the above information, use it to calculate the server variables as indicated on the worksheet. Then run through the worksheet's simple ten-line calculation to arrive at your server's total memory requirement.

IntranetWare Server Memory Calculation Worksheet(From Novell AppNotes, March 1997 - photocopy as needed)
STEP 1: Calculate the following variables.
V1. Enter the total number of megabytes of disk connected to the server(Enter 1 for each MB; enter 1024 for each GB)	MB
V2. Calculate the number of megabytes of useable disk space connected to the server(If you are mirroring or duplexing, multiply V1 - 0.5; otherwise copy V1)	MB
V3. Enter the server's volume block size (4, 8, 16, 32, or 64)	KB
V4. Calculate the number of disk blocks per MB (divide 1024 . V3)	Blocks/MB
V5. Calculate the total number of disk blocks (multiply V2 - V4)	Blocks
V6. Enter the maximum number of clients (end-users) attached to the server(For example: enter 24 for 24 end-users)	Clients
V7. Enter the maximum number of files that will reside on the server	Files
STEP 2: Calculate individual memory requirements.
Line 1. Enter the base memory requirement for the core OS and NDS(Enter 6144 for IntranetWare; 11,264 for SFT; 12,288 for SMP)	KB
Line 2. Calculate the memory requirement for the Media Manager (multiply V1 - 0.1)	KB
Line 3. Calculate the memory requirement for directory tables(multiply V7 - .006, or if suballocation is enabled multiply V7 - .011)	KB
Line 4. Calculate the memory requirement for additional Name SpacesMultiply (V7 - .006) - number of additional Name Spaces loaded on the server	KB
Line 5. Calculate the memory required to cache the FAT (multiply V5 - .008)	KB
Line 6. Calculate the memory requirement for file cache using the following instructions.This calculation uses a 0.4MB file cache per client memory requirement. The decrease as the user community size increases is based on assumptions regarding increased repetitive use of shared data (temporal and spacial locality) within cache.Less than 100 clients; V6 - 400 Between 100 and 250 clients 40,000 + ((V6 100) - 200) Between 250 and 500 clients 70,000 + ((V6 250) - 100) Between 500 and 1000 clients 95,000 + ((V6 500) - 50)	KB
Line 7. Enter the total memory (KB) required for support NLMs.Recommended amount is 2,000KB (700 for BTRIEVE, 500 for CLIB, 600 for INSTALL, and 200 for PSERVER)	KB
Line 8. Enter the total memory (KB) required for other services.Other services include GroupWise, ManageWise, NetWare for Macintosh, NetWare for SAA, databases, and so on.	KB
STEP 3: Calculate the server's total memory requirement.
Line 9: Add lines 1 through 8 for your total memory requirement (in KB)	KB
Line 10: Divide Line 9 by 1024 for a result in MBUsing this result, round up to the server's nearest memory configuration. IntranetWare will enhance server performance by using all leftover memory for additional file cache.	KB

Tuning the Memory Estimate for Your Production Environment

The worksheet above separates what was once a single calculation into two: one for the allocation of file system data structures and another for file cache. We did this because the two calculations are completely unrelated one being quantitative and the other qualitative.

The calculation for file system data structures is a known quantity based on volume size, volume block size, and number of files stored on the volume. This file system overhead is used to cache the File Allocation Tables, Directory Entry Tables, and Media Manager tables that describe the allocation of data on the volume. The memory calculation for these data structures is straightforward because the variables are either known or can be estimated with some accuracy. However, the calculation for file cache falls squarely in the realm of qualitative analysis, making it much more difficult to estimate. This is due to the number of variables that can impact the way cache is used:

The type and characteristics of the applications used to access server resources
The characteristics of read and write requests including random or sequential patterns, the size of the requests, and the frequency of requests
The total size of the data set the server is providing access to
The number of clients

To simplify matters for the worksheet, we developed a composite client based on observations of cache usage patterns of more than 2000 clients spread over ten different IntranetWare servers. The numbers in the worksheet were based on two fundamental observations:

In tuned servers where there were fewer than 50 clients and very little common data was shared, the average amount of cache per client was 0.4MB.
In tuned servers where there were more than 150 clients and there was a high degree of shared data access, the average amount of cache per client diminished as the number of clients or the degree of sharing increased.

The worksheet calculation for Line 6 is derived from these two observations. It is weighted more toward the number of clients than the degree of shared data access. Knowing this, you can adjust the Line 6 calculation up or down according to the following guidelines:

If the server will act as a store for hundreds of users that will be using productivity applications to access private data in private directories, use the 0.4MB, 0.2MB, 0.1MB, and 0.05MB values in the calculation. You could even consider rounding them up to suit your needs.
If the target server's data set is completely shared and you expect a high degree of cache re-use by your user community, you may be able to round down the values on Line 6 to meet your needs.

After you estimate your server's memory requirements, you need to round the estimate up to the server's next largest memory configuration. But remember, this is just an initial estimate. You won't know exactly how much memory your server will require until you finish the optimization process by tuning your server's cache. Your server may require a little less or a little more memory, depending on the characteristics of the server's workload.

Tip: If your organization's purchasing process doesn't allow you the flexibility to buy and install the server before finishing the tuning process, you may want to add 5 to 10% more memory to your estimate, just in case your original estimate is low.

Strategies for Tuning File System Cache

Once you've installed the server you're almost ready to tune the cache subsystem. This tuning process is critical if you want your IntranetWare server to operate at peak performance levels. The IntranetWare Memory Worksheet helps you estimate the amount of total memory required for your server's unique configuration, but that estimate is not enough to guarantee peak performance. The tuning process described in this section allows you to know for certain whether your server's memory resources are configured properly.

Before tuning the cache, you need to install the server in its production environment and allow your users to come up to speed on the server. This process can take days or weeks as the network users incorporate the server's applications and services into their daily work patterns. During this time, the server will go through its own settling-in process a time in which the server's auto-tuning processes tune their memory allocation for their internal data structures.

Once the settling-in process has had time to complete, the server will follow a fairly predictable daily utilization pattern. You are now ready to take the last step in the memory optimization process tuning the cache subsystem. The tuning process is focused on tuning the read path that is used by each individual user's read requests, since this is the critical path that defines response times for the user. Two separate caches are used to increase performance of the read path: the directory cache, which stores recently-used disk directory entries, and file cache, which stores recently-used file data.

Tuning Directory Cache

As directory entries are read and operated upon by a user, IntranetWare caches the entries to make repeated use of an entry more efficient. In a default configuration, IntranetWare allocates 20 4KB cache buffers that hold 32 directory entries each. This default setting is only appropriate for a small number of users. To accommodate larger user communities, IntranetWare has an auto-tuning process that automatically allocates more directory cache blocks based on several heuristics and two IntranetWare SET parameters. The tuning strategies in this section will help you tune the directory cache and the auto-tuning process for your network's unique requirements.

Additional Name Spaces will also necessitate tuning for directory cache. If you're installing additional Name Spaces on one or more of the server's volumes, follow the strategies in this section to size your directory cache appropriately.

Directory Cache Theory of Operations. Each Name Space installed on an IntranetWare server's volumes, besides the default DOS Name Space, requires support modules (NLMs) and a modification to the server's directory entry tables (DETs). The support modules require minimal memory for code. However, the modifications to each volume's DET will require additional server memory for directory caching if the server's directory usage patterns are heavy and match several criteria.

With or without Name Spaces, all file and directory operations are handled through a single directory cache allocated and managed by IntranetWare. The purpose of the directory cache is to hold onto DET blocks recently read from the disk in anticipation of repeated use.

In NetWare 2, the entire DET was cached; but with the growth of disk capacities, fully cached directories became unrealistic for most servers. For example, a DOS file system on a server containing 500,000 files requires 65MB just to cache the DET. To avoid caching the entire DET, NetWare 3, 4, and IntranetWare use a most-recently-used (MRU) cache policy to manage their directory caches. The MRU policy is used to keep only the most recently used DET blocks in cache, tossing least-recently- used (LRU) blocks out when new DET blocks are requested. The MRU policy is an efficient means of using a much smaller cache to provide access to a very large data structure.

DET Ratios. When a Name Space besides the default DOS Name Space is installed on a volume, the volume's DET is extended to include an additional directory entry for each file. For instance, on a volume supporting DOS, NFS, and LONG (HPFS) Name Spaces, IntranetWare manages three directory entries for each file: one entry for each installed Name Space, including DOS.

During file creation and other directory-related file operations, multiple Name Space directory entries for each file remain contiguous and are located in the same DET block on disk and in cache. This contiguous relationship overcomes the undesirable scenario in which the entries are non-contiguous, forcing multiple DET blocks to be read to have access to all DET references to the same file.

Tip: When a Name Space is added to an existing volume, the DET is not reorganized with each new Name Space directory entry placed immediately adjacent to each DOS entry. It is wise to install additional Name Spaces before the volume is used.

Under IntranetWare's native DOS support, each block read into cache contains 32 entries that provide information linked to 32 files. This means that a user has access to the directory information for 32 files without having to read another DET block from the disk. This ratio of files represented per DET block is important because additional Name Spaces alter it significantly.

The Effect of Name Spaces on Directory Cache Buffer Efficiency. The issue of directory cache tuning enters the picture when you begin to cache the DET after adding one or more Name Spaces to one or more volumes on the server. When IntranetWare clients access a volume with additional Name Spaces loaded, their access can be slowed because information stored in one directory cache buffer no longer represents 32 files. It represents 16, 10, or 8 files, depending on the number of Name Spaces configured on that volume.

For example, if you load the Macintosh (MAC) Name Space on top of the native DOS Name Space, DOS and Macintosh clients have to traverse ten directory blocks to perform the same work that before only required the traversal of five. The addition of the MAC Name Space doesn't change the directory entry block's ability to hold 32 entries, but now with two entries per file, the same directory entry block only represents 16 files. If you add another Name Space, the result is three entries per file for a total of 10 files represented per DET block. Add an additional Name Space for four entries per file and you have only eight files represented in each DET block.

The bottom line is that the efficiency of your directory cache is decreased by a factor equal to the number of Name Spaces you have loaded. By using the tuning strategies outlined below, you can maintain the high performance of IntranetWare's directory cache.

Three Directory Cache Tuning Strategies

As with the tuning of any IntranetWare parameter, sizing the directory cache depends largely on the characteristics of the workload the server will be servicing. In this case, we use the directory access patterns exhibited by the server's user community. The key is the frequency and breadth of directory searches, file opens, closes, and creations:

A low-use scenario could involve any number of users in which a small number of directories are shared, or in which each users' activity is limited to a small region of the directory, such as a home directory.
A high-use scenario could also involve any number of users, but user activity spans a very large number of directories and files. An extreme case might be a document-based system in which document searches routinely traverse large portions of a very large directory.

Strategy 1: Handling Low Usage. At the low-end, you won't need to allocate any more cache than IntranetWare's directory caching defaults permit. IntranetWare's defaults allow IntranetWare to allocate 20 buffers immediately upon request, followed by a maximum allocation of up to 512 directory cache buffers (2MB). This allocation is sufficient for the majority of low use scenarios.

Strategy 2: Handling Very High Usage. For the high-end, you can adjust IntranetWare's auto-tuning facility to allow it to allocate up to 8MB of memory for directory cache immediately upon request, followed by a maximum allocation of up to 16 MB of total directory cache memory. These settings allow IntranetWare to cache up to 4096 directory cache blocks.

To do this, place the SET parameters shown below in your server's autoexec.ncf file:

SET maximum directory cache buffers = 4000 SET minimum directory cache buffers = 2000

Strategy 3: Tuning the Cache. If neither strategy 1 nor 2 matches your circumstances, use this strategy to tune your directory cache.

First, allow your server to operate in its production environment for several weeks. This allows IntranetWare's auto-tuning facility to allocate the appropriate number of directory cache buffers based on its native DOS-based heuristics. After this settling-in period, use MONITOR.NLM to inspect the number of allocated directory cache buffers. This number is the Directory Cache Buffer Watermark for your server.

Follow the chart below to multiply your server's watermark by the total number of Name Spaces (including native DOS support) to arrive at a new buffer allocation.

Native DOS Support	Do nothing
Native DOS Support + 1 additional Name Space	Multiply the Directory Cache Buffer Watermark by 2
Native DOS Support + 2 additional Name Spaces	Multiply the Directory Cache Buffer Watermark by 3
Native DOS Support + 3 additional Name Spaces	Multiply the Directory Cache Buffer Watermark by 4
Native DOS Support + 4 additional Name Spaces	Multiply the Directory Cache Buffer Watermark by 5

Use this resulting buffer allocation to set the new minimum directory cache buffers for the server. After setting the minimum, set the maximum to at least 100 buffers above the minimum, to allow the directory cache some room to grow under peak workloads before installing additional Name Spaces. For example, suppose your Directory Cache Buffer Watermark is 250 and you have only the default DOS Name Space loaded. Before you load another Name Space, you would change the IntranetWare SET parameters as follows:

SET minimum directory cache buffers = 500 SET maximum directory cache buffers = 600

These new settings will allow IntranetWare to freely allocate new directory cache buffers in a multiple Name Space environment. They also increase the likelihood that (1) repeatedly used directory cache buffers will remain in cache, and (2) those buffers will remain in cache longer. The resulting directory cache is designed to support systems that house one or more Name Spaces with the best possible read-path response times.

Checking Your Directory Cache Settings. After you have made these changes and allowed for a settling-in period, check to see whether the server performs the anticipated allocation. If it does not, you know that your user community's directory access patterns don't require the additional cache. On the other hand, if your server uses all the cache you made available, your user community's directory access patterns may be more significant than you anticipated.

Based on your knowledge of the end-users' application and response time requirements, the server's healthy allocation of directory cache may suggest the addition of more directory cache resources. The price is low: each additional 100 cache buffers add 3,200 directory entries to cache but take up only 0.4 MB of server memory. Just keep in mind that any memory given to the directory cache is taken from the server's file cache pool. If you continue to take memory from file cache for directory cache in your efforts to size the directory cache, you may ultimately need to add memory to your server.

Tuning File Cache with the LRU Sitting Time Statistic

IntranetWare's file cache subsystem is a critical area to tune when you're considering ways to improve server performance. File cache not only speeds access to file data, it is used to cache portions of the NDS database. If you're interested in tuning your IntranetWare server in general, or NDS specifically, the file cache is a great place to start. This section describes how the file cache works and outlines a step-by-step process you can use to tune this cache and measure your success.

File Cache Theory of Operations. NetWare's file caching subsystem is a pool or collection of 4 KB memory pages. After loading the OS, system NLMs, and application NLMs, IntranetWare initializes all remaining memory to form the file cache pool.

File cache memory is organized using a data structure called a linked list. At the beginning of the list is the "list head," where new cache buffers are inserted into the list. The end of the list is the "list tail," where old cache buffers are removed from the list. Each cache buffer in the list is linked to the next cache buffer, and each one includes a time stamp indicating the time the cache buffer was inserted into the list head (see Figure 8).

When the server receives a disk I/O request for data that is not currently in cache (a cache "miss"), the data is read from the disk and written into one or more cache buffers that are removed from the list tail. Each newly filled cache buffer is time-stamped with the current time and linked into the list head. A newly filled cache buffer is designated as the most-recently-used (MRU) cache buffer because it has resided in cache for the least amount of time.

Figure 8: File cache linked list.

A cache "hit"-a frequent event in IntranetWare environments-occurs when a disk request received by the server can be serviced directly out of cache, rather than from disk. In this case, after the request is serviced the cache buffer containing the requested data is removed from the list, time-stamped with the current time, and relinked into the list head. In this manner, MRU cache buffers congregate at the head of the list. This characteristic of the list is important to understand, because you want your MRU cache buffers to remain cached in anticipation of repeated use and repeated cache hits.

At some point in this process, the file cache pool becomes full of recently used data. This is where the least-recently-used (LRU) cache buffer comes into play. LRU cache buffers are buffers that were originally filled from the disk, but haven't been reused as frequently as the MRU cache buffers at the list head. Due to the relinking of MRU cache buffers into the list head, LRU cache buffers congregate at the list tail. When new cache buffers are needed for data requested from disk, IntranetWare removes the necessary number of LRU cache buffers from the list tail, fills them with newly requested data, time-stamps them with the current time, and relinks them into the list head. The resulting IntranetWare file cache subsystem gives preference to repeatedly used data and holds onto less frequently used data only as long as the memory isn't needed for repeatedly used data.

When tuning file cache, then, the ideal scenario is one in which every repeated use of recently accessed data can be serviced out of cache. This is accomplished by sizing server memory so that the resulting file cache pool is large enough to retain all repeatedly used data. But how can you measure your success? A statistic known as the LRU Sitting Time holds the answer to this facet of IntranetWare file cache efficiency.

LRU Sitting Time. The LRU Sitting Time statistic is updated and displayed once per second in MONITOR.NLM under the Cache Statistics menu (see Figure 9). This statistic is calculated by taking the difference between the current time and the time stamp of the LRU cache block at the tail of the cache list. The result is displayed in HH:MM:SS.0 format (beginning with hours and ending with tenths of a second).

Figure 9: LRU Sitting Time statistic displayed in MONITOR's Cache Statistics screen.

The LRU Sitting Time measures the length of time it is taking for an MRU cache buffer at the list head to make its way down to the list tail, where it becomes the LRU cache buffer. One might refer to this measurement as the cache "churn rate" because, whether from cache hits or misses, every cache buffer in the list is being reused within that period of time.

In configurations with an excessive cache, the LRU Sitting Time can be very high, even many hours. At the other extreme, with insufficient cache, the LRU Sitting Time can be down in the 10 to 20 second range. The time will vary widely depending on your circumstances.

On inactive servers, including those that sit unused overnight and those in lab environments or other settings with long periods of idle time, the LRU Sitting Time statistic will exhibit a curious behavior-the statistic will increment by one second every second. This is normal for an unused server because the LRU Sitting Time indicates the age of the LRU cache buffer. In an inactive server, where the LRU cache buffer remains unused, its age increases by one second as each second of time passes. The LRU Sitting Time statistic is useless under these circumstances, except to confirm the obvious-that new data is not being written to the server's cache. This statistic is most useful during peak workloads when the server's cache is servicing the greatest number of users and the broadest possible data set.

File Cache Tuning Strategy. Here's a step-by-step process to help you use the LRU Sitting Time statistic effectively.

Track Server Resource Utilization Statistics. Use STAT.NLM (available from Novell Research at http://www.novell.com/research or on the "Into the Future" Anthology CD-ROM) to track server resource utilization statistics. Chart the results for daily, weekly, monthly, period-end, and year-end cycles. Identify recurring periods of peak workloads.
Observe Cache Statistics. Monitor the LRU Sitting Time during peak workload periods of the day, as identified with STAT.NLM above. Keep a record of the lowest LRU Sitting Time during your observations for at least one week, longer if necessary to see a consistent pattern.
Develop a Low Watermark. Based on the knowledge you have gained of your server, users, workload patterns, work cycles, resource utilization statistics, and cache statistic observations, determine your average low LRU Sitting Time. This average becomes your low watermark.
Tune the Cache. Now you're ready to tune the server's cache. We recommend that your cache be sized in such a way that it is able to sustain an LRU Sitting Time low watermark that is equal to or greater than 12 minutes.

Here's where your homework pays off. If your low watermark is 7 minutes, you'll need to add memory to increase the LRU Sitting Time during those peak workloads so that it averages 12 minutes. The added memory increases the likelihood that repeatedly used data will still be cached when the next user request is received.

On the other hand, if your low watermark is 18 minutes, you have more than adequate cache resources. In this case, you can leave the excess memory in the server as a buffer for future growth, or you may want to consider removing some memory and using it in another server where it may be more beneficial.

The point is not whether you actually add or remove memory from your server. This information is intended to improve your ability to interpret the LRU Sitting Time statistic and thereby provide you with a meaningful way to understand the efficiency and performance of IntranetWare's file cache.

Monitoring IntranetWare Memory Allocation

After you've estimated and tuned the memory requirements for your IntranetWare server and installed the server in its production environment, it is a good idea to check up on the server from time to time to see how it is allocating memory. This section describes some of the more pertinent memory statistics available in the MONITOR.NLM utility.

Monitoring Memory Allocation During OS Initialization

During IntranetWare's boot process, the operating system allocates just enough memory for its initial code and data pools. The remaining memory is handed over to the cache subsystem. As needed, memory is then allocated from the cache for additional OS subsystem requirements, system NLMs, and other applications.

The main MONITOR screen provides some insight into this allocation process. An example screen is shown in Figure 10.

Figure 10: MONITOR's General Information screen.

The two statistics relevant to this discussion are the Original and Total Cache Buffers lines.

Original Cache Buffers. This is the number of 4 KB buffers handed off to the cache subsystem during OS initialization. From this total, you can deduce the number of buffers initially required by the OS.

Total Cache Buffers. This is the current number of cache buffers used by the cache subsystem for file caching. This number will fluctuate as OS subsystems and NLMs allocate from cache, or as free memory is later returned to cache. You can figure the amount of cache memory (in bytes) by multiplying Total Cache Buffers by 4096. Another way is to view the "Cache buffers (bytes)" statistic displayed in MONITOR's Server Memory Statistics screen, which is covered next.

The Resource Utilization Option

After selecting MONITOR's Resource Utilization option, you can see the Server Memory Statistics screen. An example is shown in Figure 11.

Figure 11: MONITOR.NLM's Server Memory Statistics screen.

This screen describes how the server is allocating all of its working memory. (You can also monitor the memory allocations of specific NLMs by selecting an NLM in the System Modules window.) An explanation of each statistic follows.

Allocated Memory Pool. This is the total memory allocated by NLMs using IntranetWare's memory allocation APIs. This total is broken out in greater detail in the Alloc Subsystem section of this document.

Cache Buffers. This is the total memory residing in file cache. File cache is the main pool from which NLM memory requests are serviced.

Cache Movable Memory. This memory is allocated by the OS using an internal API. This memory pool is used for a variety of OS tables including file allocation tables (FATs), directory hash tables, and connection tables.

Cache Non-movable Memory. This memory is allocated by the OS using a non-movable memory API that is only available to the OS and system NLMs. This API was used in the past because it incurred less overhead than movable memory.

Code and Data Memory. This is the total amount of memory allocated by the OS for OS code and data, as well as NLM code.

Total Server Work Memory. This is the total amount of memory in the server. In our example, 24,768,512 translates to 24 MB. Divided by 4096, this server has 6144 4 KB pages of available memory.

Monitoring the Alloc Subsystem

You can use the Memory Utilization option from MONITOR's main menu to find the Allocated Memory For All Modules screen (see Figure 12).

Figure 12: The Allocated Memory For All Modules screen.

This screen provides more detail about the allocated memory pool listed in the Server Memory Statistics screen in Figure 11. When you first enter this screen, the statistics represent all loaded modules (NLMs). If you select a specific NLM, these statistics reflect the memory allocated for that NLM. Here is an explanation of the Allocated Memory statistics.

4KB Cache Pages. This is the number of 4 KB cache pages currently in use by all NLMs. As NLMs request different sizes of memory blocks, the Alloc subsystem services those requests with memory from the cache subsystem in multiples of 4 KB blocks.

Cache Page Blocks. This is the number of memory blocks made up of multiple 4 KB buffers that have been allocated from the Cache subsystem (4 KB buffers - n).

Percent In Use. This shows the percentage of total Alloc memory currently in use.

Percent Free. This shows the percentage of total Alloc memory currently on NLM free lists.

Memory Blocks in Use. Memory blocks are different from the cache blocks described above. Memory blocks are individual pieces of memory requested by an NLM and can be as small as 16 bytes. If MONITOR followed internal OS nomenclature, these would be called memory nodes rather than blocks. If an NLM requests 1024 bytes of memory followed by three more similarly-sized requests, the NLM would own four 1024- byte nodes.

Memory Bytes in Use. This represents the total number of bytes in use by NLMs.

Memory Blocks Free. This represents the total number of memory nodes that reside on the free lists.

Memory Bytes Free. This represents the total number of bytes of memory on the free lists.

Understanding Discrepancies

You may notice a discrepancy between the values reported in the Memory bytes (in use and free) statistic in Figure 12 and the Allocated memory pool (bytes) value in Figure 11. This discrepancy is caused by some unreported overhead incurred during the allocation process. Part of this unreported memory is used to manage the allocated memory. There is a 16-byte overhead for each node (Memory blocks in use and free) which are the individual pieces of memory requested by an NLM. There is also a 16-byte overhead for each block (Cache page blocks). The remaining unreported memory is allocated by the OS. (In several of our lab servers, this amounted to about 50 KB of RAM. The amount may vary depending on your circumstances.)

Another discrepancy can be caused by NetWare's garbage collection process. Memory involved in garbage collection is unlinked from the free lists and the amount is subtracted from the free list statistics (Memory bytes free, Memory blocks free). However, the memory's return to cache and the subsequent update of the Alloc subsystem's memory statistics may not occur until a later time when the garbage collection process has completely finished returning the memory to the cache subsystem.

Conclusion

Due to the growing complexity of network servers, computing a server's memory requirements is no simple matter. The IntranetWare memory optimization worksheet and guidelines in this AppNote simplify the process by helping you estimate the server's memory requirements and eventually tune the server's memory resources for its actual production workload.

System designers and administrators can apply this process to IntranetWare servers in any environment. The additional time required to tune the server will pay significant dividends in optimal performance. You'll also have peace of mind from knowing that the server has adequate memory resources to support peak workloads.

* Originally published in Novell AppNotes

Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.