Novell is now a part of Micro Focus

Network Utilization Voodoo

Articles and Tips: article

ADAM JEROME
Developer Support Engineer
Novell Developer Support

01 Nov 1995


Several Novell tools are available to help developers understand performance issues. However, some users misunderstand these issues and resort to "voodoo network management." For example, the idea that processor utilization is the primary expression of NetWare's performance is simplistic at best. This article discusses four NLMs that offer a different perspective on network utilization. In summary, high processor utilization alone should not be thought of as a bad thing. It is entirely appropriate for certain processes to cause 100 percent utilization.

Introduction

Novell has always been very concerned about NetWare's performance and efficiency. Novell has produced many tools to help developers, system administrators and integrators understand performance issues. Many have learned to use these tools to monitor NetWare and network performance. Those who have not mastered these tools can be overcome by alleged performance issues (real and imagined).

Lacking skill and understanding of the issues, many resort to "voodoo network management." Signs of this management method include vague solutions to performance issues such as the "power off and try again" answer. This management style results in a lack of confidence in network management departments and networking technology in general.

Perhaps future hardware and software will become self-aware and perform its own self-diagnostics and adjust its own performance and efficiency. Until then, it is the responsibility of network administrators and software developers to understand their systems and maximize performance and efficiency.

NetWare Utilization

"Captain! The containment field is at 100 percent! I don't know how long she'll be able to hold together at this level!"

All NetWare file servers include a "Processor Utilization" statistical value, which is accessible via MONITOR.NLM. Developers can also access this value using the function call GetServerUtilization().

Those who hold fast to voodoo NetWare doctrines declare that this value is the primary expression of NetWare's performance and efficiency. They continue by suggesting that the higher this value is, the worse NetWare's performance will be. They also say that high utilization is a problem that must not be ignored. Solutions include adding more memory, getting a faster processor, and witch hunts for rogue NLMs causing increased utilization. Such doctrine is misleading and simplistic at best.

The lowly utilization value is "relative to the amount of time the kernel spends in the idle loop process." In other words, it is a measurement of the amount of time the processor wastes doing nothing. A high utilization value simply indicates that the processor has something to do and is using less processor time doing idle loops. NetWare is using {n} percent of the processor's potential.

Q: Processor utilization expresses the efficient use of the processor?

A: False. Writing an NLM that is wasteful and inefficient is very easy for a developer. For example:

/* LOOP1.NLM */

       void main(void)

          {

          while(1)

              NULL;

          }

LOOP1.NLM is a worst-case example of a NetWare CPU hog. This NLM appears to cause NetWare to "hang." The <ALT-ESC<and <CTRL-ESC< keys do nothing; therefore you cannot switch to the MONITOR screen to see what the current utilization is. Even if you could, the MONITOR screen would be frozen.

In reality, LOOP1.NLM does not cause a malfunction in NetWare. NetWare might be considered by some to be "hung," but is actually operating within established specifications. For example, once a process gets control of the processor, it retains control until it voluntarily relinquishes that control. LOOP1.NLM does not relinquish control, and it runs forever. This is permissible but so inefficient that NetWare's performance suffers a complete lack of processor access. LOOP1.NLM is not a "NetWare friendly" application.

/* LOOP2.NLM */

     #include <conio.h<<
       void main(void)

       {

       while(1)

             putch('x');

       }

LOOP2.NLM is very similar to LOOP1.NLM. LOOP2.NLM shows that the server is not "hung," because it continuously displays 'x' characters on the screen. LOOP2.NLM, like LOOP1.NLM, does not relinquish control and runs forever. LOOP2 also starves NetWare from processor access.

/* LOOP3.NLM */

    #include <conio.h<<
    #include <process.h<<
    void main(void)

       {

       while(1)

          {

          putch('x');

          ThreadSwitch();

          }

       }

LOOP3.NLM is the same as LOOP2.NLM except that it contains a ThreadSwitch() statement. This is a method of relinquishing control of the processor to other processes in NetWare.

With LOOP3.NLM, you can switch to the MONITOR screen. If you do, you will notice that the utilization value is 100 percent on NetWare 4.10. The reason utilization went to 100 percent is that LOOP3.NLM is always active. Though it relinquishes control, it simply reschedules itself to run again as soon as possible.

It is up to the application developer to say whether LOOP3.NLM is or is not an efficient use of the processor. As the developer of LOOP3.NLM, I judge the printing of 'x' characters to be a low priority, and elect not to use all available processor bandwidth. However, be aware that there are applications that appropriately elect to use all available bandwidth to get a job done with all due speed. This behavior is acceptable when mingled with timely ThreadSwitch() calls, which release control to other NetWare processes. Problem applications are those that hog the CPU for too long without relinquishing control.

"It seems to be vapor locked."----mechanic who couldn't find the problem.

Some voodoo doctrines exclaim that it is inappropriate for a process to cause utilization to stay at 100 percent. This philosophy is incorrect. There are processes that make most efficient use of the processor and as a result causes 100 percent utilization. This is totally appropriate. The alternative would risk not processing tasks faster than they are assigned; causing tasks to tie up system resources longer than necessary. (Novell's own TCPIP.NLM and SPXS.NLM are perfect examples of NLMs that appropriately cause frequent 100 percent utilization.)

/* LOOP4.NLM */

    #include <conio.h<<
    #include <process.h<<


    void main(void)

       {

           while(1)

           {

           putch('x');

           ThreadSwitchWithDelay();

           }

     }

LOOP4.NLM shows an efficient method of printing 'x' characters on the screen. Like LOOP3.NLM, it frequently relinquishes control but not by simply rescheduling itself to run again as soon as possible. Rather, LOOP4.NLM pulls itself off the "run queue" and places itself on the "delay queue." It remains on the delay queue for a short time before it is rescheduled back to the run queue. The amount of time that a process remains on the delay queue is strictly determined by NetWare OS, but is generally no more than 50 context switches by default.

The default maximum number of context switches can be manipulated using the SetThreadHandicap() function.

You will notice that MONITOR may now report less than 100 percent utilization with LOOP4.NLM. To the Voodoo doomsayer, the utilization may still be uncomfortably high. It is important to understand that when NetWare's run queue is empty, it attempts to find something to do by activating processes placed on the delay queue by ThreadSwitchWithDelay(). The concept is that it is better to do something than to waste processor bandwidth. Processes on the delay queue sleep their full allotted number of context switches as long as there are processes on the run queue.

I have mentioned NetWare 4.10's run queue and delay queue. NetWare 4.10 also sports a "low-priority queue." Processes on the "low-priority queue" are run only when there is nothing on the run queue, and there are no processes ready to run on the delay queue. Processes that live on the low-priority queue might include clean-up utilities, virus scanners, file compression utilities, backup utilities, etc. Application developers use the ThreadSwitchLowPriority() function to place their process on the low-priority queue.

In summary, high processor utilization alone should not be thought of as a bad thing. Processes that make efficient use of the processor by performing high-priority work and still periodically giving up control of the CPU can appropriately cause utilization to stay at 100 percent.

NetWare NLM Performance and Efficiency Tools

I have pointed out that the utilization statistic can be overrated. There are tools provided by Novell for monitoring the real performance of NLMs. These tools include SENTRY.NLM and NLMDEBUG.NLM.

SENTRY.NLM is a menu-driven NLM utility designed to monitor the time slices and CPU utilization of NLMs. Its major purpose is to verify that an NLM does not exceed the maximum allowable time slice requirements of NLM behavior testing. It is delivered with the Novell SDK CD.

NLMDEBUG.NLM is currently a beta product from Novell. It is shipped with the Novell SDK CD in the "Futures" section, and is available on CompuServe (GO NDEVSUP, Library #5 [Server SDK], TDBG1A.EXE). It provides the ability to check your NLM's resources, semaphores, memory overwrites, and debug NCPs. Most importantly, it includes an "NLM Process Timer" that can determine where your NLM needs to relinquish control of the processor. This facility will give developers the information needed to quickly debug NLM applications so that they will relinquish the processor in an efficient manner.

* Originally published in Novell AppNotes


Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.

© Copyright Micro Focus or one of its affiliates