Novell is now a part of Micro Focus

Multiprocessing Support in NetWare 6

Articles and Tips: article

Kevin Burnett
Senior Research Engineer
Novell AppNotes
kburnett@novell.com

01 Oct 2001


NetWare 6 is a reliable, highly-scalable version of NetWare which takes advantage of high-powered Multi-Processor (MP) server hardware by MP-enabling the complete packet transfer from the wire to the storage media. This AppNote provides background information about NetWare 6's MP functionality and explains how MP-enabled programs run on NetWare 6. It details the MP-related improvements made in NetWare 6 and discusses development opportunities for the new OS.


Topics

network programming, NetWare 6, multiprocessing

Products

NetWare 6

Audience

network administrators, integrators, developers

Level

beginner

Prerequisite Skills

familiarity with networking basics

Operating System

NetWare 6

Tools

Novell Kernel Services (NKS) NDK

Sample Code

no

A Short History of NetWare MP

NetWare 6 is Novell's second-generation MP network operating system. Actually it could be looked at as being a third generation, as you will see from this short history. Novell introduced MP functionality with NetWare 4.x. This first attempt was somewhat limited in functionality in that the core operating system (OS) was not MP-enabled. All of the core OS functionality had to be funneled to processor 0, which is the default processor that threads are run on when the application is not MP-compliant. This version of NetWare allowed applications that were written to the MP standard to run on processors other than processor 0. But any time the application needed to use core OS functionality-disk access, transmit on the wire, and so on-the request had to be reverted back to processor 0. Hence, it was not a complete solution.

With the advent of NetWare 5, the MP functionality was completely rewritten and integrated into the NetWare OS Kernel. This made the vast majority of OS functionality MP-compliant. However, there were still some essential services that had to run on processor 0. Functionality such as LAN drivers and disk drivers still needed to be MP-enabled.

In NetWare 6, all components are MP-compliant. The whole chain of events, from the network wire to the hard disk storage devices, is MP-enabled. Thus with NetWare 6, Novell now provides a complete MP server solution.

NetWare 6 MP Functionality

NetWare 6 has been designed from the ground up to run on Symmetric Multiprocessing (SMP) hardware. Typically, a computer hardware manufacturer will refer to a SMP machine as a "high-end server." Today, SMP machines are shipped with one to 32 processors. In most cases, the machines are processor upgradable, meaning you can add processors as your needs demand it. A benefit of upgrading to an SMP machine is that you can have a server with six processors doing the work that up to six separate servers used to do.

As shipped, NetWare 6 includes the following MP-enabled components:


Protocol Stacks

NetWare Core Protocols (NCP)

Service Location Protocols (SLP)

IP Stack

HTTP

Ethernet Connectivity

Token Ring Connectivity

Web-based Distributed Authoring and Versioning (WebDAV)

Lightweight Directory Access Protocol (LDAP)

NetWare News Server


Storage Services

Novell Storage Services (NSS)

Distributed File Service (DSS)

Protocol Services Request Dispatcher

Transport Service Request Dispatcher

Fiber Channel Disk Support


Security Services

Novell International Cryptographic Infrastructure (NICI)

Authentication

Authentication ConsoleOne snap-ins


Miscellaneous Components and Services

eDirectory (NDS)

Novell Java Virtual Machine (JVM)

Web Engines

Additional Web Features

Search Engine

. . .

Before we discuss MP and the way it is implemented in NetWare 6, a discussion of threads is in order. This is because to truly understand MP, you need to understand threads.

Threads in NetWare

Ever since NetWare was first released, it has used the concept of threads to allow the NetWare OS to work efficiently. A thread is simply a NetWare OS process, but in technical terms a process is slightly different from a thread. A process typically saves most of the processor's state when it is swapped out, while a thread typically saves less of the processor's state. What's more, processes are usually preemptive (they take control of all resources, but can be interrupted) compared to threads, which are nonpreemptive (they run to completion).

The NetWare OS schedules different threads to run in its Run queue. The threads are executed in a first-in first-out (FIFO) order. In addition, the NetWare OS allows NetWare Loadable Module (NLM) applications to establish multiple threads, each representing a distinct path of execution. An NLM has to contain one thread at the minimum, but typically will contain two or more threads.

Only one thread can run at a time. While the thread is running, it has control of the system's microprocessor (CPU). NetWare is a nonpreemptive OS, meaning it allows threads to run to completion once they start to execute. When a thread gains control of the CPU, the thread remains in control until it has run to the end of its execution, or until it relinquishes control and reschedules itself on the run queue. In an MP world, this refers to one processor in the server.

Multiprocessing

Looking at classic NetWare 5.x on a one-processor box, it appears that NetWare is executing two or more applications or functions at the same time. This is referred to as multitasking. NetWare is a multitasking OS since it gives the illusion that a single CPU is executing two or more programs at once. However, in reality, it is executing the threads in these programs in a consecutive manner.

Running on one processor, a multithreaded and multitasked OS such as NetWare can't execute more than one thread at one time. Even if you have a multi-CPU computer, you will not be able to exploit the additional CPUs unless you have applications that are specifically written to be multi-processor compliant or MP-enabled. MP-enabled applications are programmed in such a way that their threads can safely execute simultaneously on multiple processors. With NetWare 6 and properly programmed MP-enabled applications, multitasking becomes a reality. Your applications can execute multiple threads on multiple processors at the same time!

Server Hardware Specifications

To get the most out of what NetWare 6 has to offer, appropriate hardware is a must. NetWare 6 supports hardware that is designed around Intel's Multi- Processor Specification (MPS) v1.4. This specification is used by PC manufacturers to design and build Intel-based systems that use two or more processors. The current version (1.4) includes support for multiple PCI buses, future expandability, and up to 32 processors (see Figure 1).

MPS hardware bus.

As seen in Figure 1, MPS v1.4 defines a specification where all of the processors in the system work and function together similarly. All the processors in the system share a common I/O subsystem and also use the same memory pool. MPS-compatible operating systems are able to run without special customization on multiprocessor systems that comply with this specification. End-users who purchase a compliant multiprocessor system will be able to run their choice of operating systems.

Since NetWare 6 complies with Intel's specification, it will automatically take advantage of all the processors in your MPS hardware-provided the MPS hardware supports the Intel specification. That really shouldn't be a problem since the major computer manufacturers, such as Dell and Compaq, support the specification.

If you are interested in reading the complete Intel MPS v1.4 specification, it is available at Intel's site: http://developer.intel.com/design/intarch/MANUALS/242016.htm.

While we are talking about MP hardware, we should clear up one common misunderstanding. Many people assume that if they buy a two- processor MPS-enabled machine, they will get the equivalent processing power of two separate and distinct servers. While this is the goal of MP hardware and software engineers, this is not the case in our imperfect world. The general rule is this: as the number of processors increases, the processing power increases, but to a somewhat lesser degree. So with a two- processor MPS system you get roughly 1.8 times as much processing power as a server with one processor. A four-processor system offers about 3.5 times as much processing power, and a six-processor system offers about 5.2 times the processing power.

Running Programs on NetWare 6

After you have installed NetWare 6 on your MPS hardware and started it up, the NetWare 6 Kernel determines how many processors are in the system. Next, the Kernel's Scheduler determines which processor to run the available threads on. This decision is based on information about the threads themselves and on the availability of processors.

Three types of programs can run on NetWare 6:

  • MP Safe

  • MP Compliant

  • NetWare OS

MP Safe programs are typically NLMs that are not MP-enabled, but which are safe to run in an MP environment. These programs run on Processor 0, which is home to all MP Safe programs. The NetWare 6 OS is very accommodating to programs that were written prior to the introduction of MP NetWare.These non-MP-aware applications are automatically scheduled to run on Processor 0 upon execution.

MP Complaint programs are specifically written to run in an MP environment. When one of these programs loads, the NetWare 6 Scheduler automatically assigns the different threads to available processors. The Intel MPS Specification allows programs to indicate if their specific threads want to run on a specific processor. In this case, the NetWare Scheduler will assign that thread to run on the requested processor. Although this functionality is available in NetWare 6 for those MP utilities and other programs that require the ability to run on a specific processor, Novell Engineering discourages developers from writing programs this way.

When an MP compliant program is loaded, the NetWare Scheduler checks for an available processor to run the thread on (provided its threads aren't required to run on a requested processor). If the first available processor was processor 3, then the thread would be scheduled to run there. The next thread would go to processor four, and so on. This assumes that the processors make themselves available in consecutive order. If the system only has one processor, all the applications' threads will be queued up to run on processor 0, which is always the first processor regardless of whether it is an MP or non-MP environment.

Lastly, the NetWare OS is completely MP compliant, allowing its multitude of threads to run on available processors as needed.

Thread Location

When an MP-enabled NLM is loaded on a NetWare 6 server, the NetWare Scheduler will place the application's threads on available processors. Under most conditions, when a thread is assigned to a processor, it will live out its life on that same processor. Only in rare circumstances will the thread be moved to another processor. These circumstances include the following:

  • The thread is from a program that is not MP-enabled. In this case the NetWare Scheduler will move the thread to processor 0. This process is called funneling.

  • The NetWare Kernel determines that there is a lopsided balance of threads on all available processors. A thread or threads may be relocated to other processors to even out the load balancing.

It should be noted that the NetWare Scheduler's load balancing algorithm is non-intrusive. It only relocates threads when the thread load on a given processor is significantly higher than the aggregate average. If you are interested in seeing how many threads have been relocated on your server, you can use the NetWare Remote Manager utility to see how may threads have been moved within a given time frame.

When a thread is scheduled to run on a specified processor and continues to do so for the life of the thread, this is called processor affinity. Keep in mind that it is rare for threads to be relocated to other processors.

Improving Efficiency

With the speed and efficiency of today's microprocessors, the time it takes to retrieve data from RAM is much slower than the time it takes the CPU to retrieve data from its own cache. Things slow down when the CPU needs to access needed data from RAM. If a CPU can always keep the data it needs to execute in its cache, speeds will be maintained at a near maximum.

To maintain efficiency, the major CPU manufacturers include cache memory in their CPUs. However, cache memory is a lot more expensive to produce than RAM. As a result, each CPU has a limited amount of cache memory. Cache memory can be one of three types (see Figure 2):

  • Level 1 (L1) cache, which is internal to the CPU and is built fast enough for even the most demanding needs of the CPU

  • Level 2 (L2) cache, which is external to the CPU and is built almost fast enough for the CPU

  • Level 3 (L3) cache, which is external to the CPU and not as fast as L2 cache

The three types of CPU cache.

The more internal cache a CPU has, the more it costs but the more efficient it is. For example, an Intel 450 MHz Xeon processor-based machine with a 2MB L1 cache will outperform an Intel 733 MHz Pentium processor-based machine with 32KB of L1 and 256KB of L2 cache by about 40% when executing applications. But be prepared to pay about $1000 more for the performance boost, and even more for MP machines.

NetWare 6 has been tooled to minimize the direct accessing of RAM. This is done by intentionally assigning a thread to run on a given processor and letting it run its life on that processor. In this case, the data needed by that thread will always be available in the processor's cache. The CPU will be able to process the thread as efficiently as possible. The term cache miss refers to times when the CPU is forced to access RAM directly because what it needs is not in cache. NetWare 6 minimizes cache misses by allowing the threads to run their life on the same processor as often as is feasible.

Things can also slow down if cache flushes are necessary. A cache flush occurs when data is copied from the CPU's cache back to RAM. This is a necessity when the Scheduler transfers a thread from one CPU to another. The new CPU needs access to the data that the thread was using on the previous CPU, but the previous CPU had the data "checked out." So the old CPU is forced to return the data by doing a flush of its cache. In so doing, the new CPU has access to the data, and can load its cache and continue the execution of the thread. Having a lot of cache flushes will seriously hurt system performance. Hence, NetWare 6's Scheduler tries to let threads execute on the same CPU for their entire life cycle.

MPK and System Memory

In previous version of NetWare that did not include MPK functionality, there were no worries about the NetWare OS's interaction with system memory. Since there was only one processor, that processor was able to control all interaction with system memory. In the world of multiprocessing where you have multiple processors, each vying for use of system memory, what happens if multiple threads compete for other resources like the I/O channel? Without measures to control these types of things, memory corruption could occur. Even worse, the whole system could freeze due to I/O channel corruption.

To control the movement of data in the MPK system, NetWare 6 incorporates what are called synchronization primitives. Synchronization primitives include the following:

  • Mutually Exclusive Lock (mutex). This mechanism ensures that only one thread can access RAM memory or a protected resource, such as I/O access, at a time.

  • Semaphores. These are somewhat similar to mutexes, but semaphores use counters to control access to RAM memory or other protected resources.

  • Read-Write Locks. Similar to mutexes, read-write locks work with mutexes to ensure that only one thread at a time has access to a protected resource.

  • Condition Variables. These are based on an external station. In so doing, they can be used to synchronize threads. Since they are external to the thread synchronization code, they can be used to ensure that only one thread accesses a protected resource at a time.

There are two other synchronization primitives that NetWare 6 could have used: Spin Locks and Barriers. Spin Locks were rejected because NetWare 6 is essentially a user-space package. Barriers were rejected because the other primitives were deemed sufficient to implement the protection.

Thread Management and Queues

Considering how many threads are running on all of the processors in a MP system, how can the NetWare OS keep track of what is running where? This is accomplished by the Scheduler. As previously stated, the Scheduler is an integral part of the NetWare OS Kernel. The NetWare 6 Scheduler is MP-enabled, so it is able to run on all of the CPUs in the MP system. As a result, each individual CPU can maintain its own thread queue and scheduling for itself.

Each CPU maintains three separate queues to aid in thread management. These three queues are the Run aueue, the Work To Do aueue, and the Miscellaneous aueue (see Figure 3).

NetWare thread queues.

The threads in the Run queue have priority over threads in the other two queues. When a thread completes execution, the CPU checks for additional threads in the Run queue. If present, they will be run, sequentially, to completion. The threads in the Run queue are non-blocking, meaning they do not relinquish control of the CPU until they run to completion. Typically, only threads from system-critical functions such as protocols (TCP/IP, IPX/SPX, and so on) are scheduled to run in the Run queue. Many of the NetWare Kernel processes also run in this queue.

If the Scheduler finds no threads to run in the Run queue, the next thread in the Work To Do queue is run. Unlike the Run queue, these threads relinquish control of the processor. Often, programs whose threads are queued up in the Work To Do queu, call functions that relinquish control of the processor. This is called blocking.

In many cases, if a thread doesn't voluntarily give up the processor from time to time, the NetWare OS will handicap the thread so it doesn't hog all of the CPU's resources. This is due to NetWare's "nice guy" non-preemptive environment. If a particular NLM does not yield often enough, the NetWare OS places a handicap in the offending thread, which prevents the thread from being rescheduled immediately. For example, if the NetWare OS places a handicap of 100 on a thread, 100 other threads must run and yield before the handicapped thread is rescheduled to run.

The CPU processes threads in the Miscellaneous queue in the order in which they are queued up. The order is first-in, first-out (FIFO). Most application threads will queue up in the Miscellaneous queue.

Race Conditions

A race condition occurs when a single application has two or more threads running on two or more CPUs simultaneously (see Figure 4). For example, say you load the Monitor utility and look at memory statistics. It could be possible for Monitor to have two threads scheduled on two separate CPUs that need to update the same spot in RAM. This is especially bad if the two threads are part of a request from the same connection. The location in RAM may end up being overwritten by bad data.

Race conditions.

To avoid race conditions, the NetWare OS needs to make sure that threads emanating from the same connection are run on the same processor. This way, the threads are queued up and run in sequential manner, thus preventing the possibility of memory corruption.

Improvements in NetWare 6 Multiprocessing

NetWare 6's MPK Kernel is similar to that in NetWare 5, but with quite a few improvements. Besides adding bug fixes to the NetWare 6 MPK Kernel, the biggest difference is the supporting cast of NetWare 6 MP-enabled components. Some of the more significant components are the TCP/IP protocol stack, the NCP engine, eDirectory, NSS, and NICI. (A fairly complete list of these components is given in the "NetWare 6 MP Functionality" section above. )

Although all of these are important improvements, one that dramatically improves speed and performance is MP-enabling the TCP/IP protocol stack. With the popularity of the Internet, most companies are networking with TCP/IP only. As a result, all network traffic processed on a NetWare 6 server goes through the TCP/IP protocol stack. With the NetWare 5 TCP/IP protocol stack, every packet that enters and leaves the server has to be processed on processor 0, along with all the other non-MP-enabled threads. NetWare 6 alleviates this bottleneck by allowing many instances of the TCP/IP protocol stack to concurrently process packets. The only limitation would be the number of CPUs you have on your server.

Development Opportunities for NetWare 6 MP

The NetWare OS has always been one of the fastest network operating systems around. If you buy or upgrade to NetWare 6, you will immediately enjoy the increased performance coming from the MP-enabled LAN and disk channels. But your biggest performance increase will come from MP-enabled applications. If you don't run MP-enabled applications, all the threads from non-MP-enabled applications will be funneled to processor 0, causing a thread "pileup" on processor 0.

With the introduction of NetWare 5, Novell released a new version of GroupWise that made partial use of the NetWare MPK environment. Shortly after the release of NetWare 6, Novell plans an update to GroupWise that will make full use of the NetWare 6 MPK environment.

To aid developers in creating new applications that fully exploit the features of NetWare 6 or to update current applications to use NetWare 6, Novell has provided a software developer kit referred to as the Novell Kernel Services (NKS) API set. NKS consists of a new set of NLMs and interfaces for implementing multithreaded, multiprocessor-aware applications, and other programs for NetWare. These libraries include NLM libraries for C/C++ and standard C library. To access these libraries, go to http://developer.novell.com/ndk/nks.htm.

You may be wondering what the big difference is between the new NKS API set and the classic CLIB API set. The biggest difference is that the CLIB API set routed all API calls through a requester that had to execute on processor 0, since the requester was not MP-enabled. Using the NKS API set, an API that is called can execute on any of the available system processors. If it blocks, it will sleep on the same processor's queue, to be awakened and continue execution on the same processor. This eliminates the performance problems inherent with funneling applications to processor 0.

If you want to delve into the NKS API set, much information is available, complete with sample source code. The following articles published in Novell Developer Notes and Novell AppNotes discusses Novell Kernel Services Programming using the NKS API set:

For those of you who would like to learn about the original NetWare 4.x SMP implementation, the following article is available:

Conclusion

By now you could have your own copy of the NetWare 6 operating system. Visit Novell's Developer Web site at http://www.developer.novell.com to learn more about NetWare 6 and NKS API library. I encourage you to download the library and experiment with it. NetWare 6 is the future. Hopefully this article has given you the desire to update an existing application or create a new one to take advantage of all that NetWare 6 has to offer.

* Originally published in Novell AppNotes


Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.

© Copyright Micro Focus or one of its affiliates