Novell is now a part of Micro Focus

Examples Troubleshooting NetWare 4.11 SMP Issues

Articles and Tips: tip

Tom Buckley
Novell World Wide Support Services

01 Sep 1997


This document is designed to help you troubleshoot problems you might encounter in implementing Novell's NetWare 4.11 SMP (Symmetric MultiProcessing) software. It provides suggestions on how to isolate possible causes and resolve software- and hardware-related issues.

Note: NetWare 4.10 SMP is a product sold by Original Equipment Manufacturers (OEMs) as part of their networking solution. NetWare 4.10 SMP is not supported through Novell's technical support channels, but it is supported through the OEM partners who sell the product.

In a multi-processor machine, the NetWare SMP operating system kernel runs only on processor 0. Processes directly related to the NetWare file server, such as NetWare Core Protocol (NCP) requests, Novell Directory Services (NDS), disk compression, and so on, also run on processor 0.

Processes or threads that access other processors in the server include those that handle LAN channel traffic--such as routing and allocating ECBs (Event Control Blocks)--and other NLMs (NetWare Loadable Modules)--such as Novell's NetWare Web Server--that have been written to take advantage of the SMP architecture.

In the next release of the NetWare operating system, multi-processor support will be provided inside the kernel rather than through NLMs. This will allow all processes running on the server to work with multiple processors, instead of having most of them run only in processor 0.

SMP Troubleshooting Techniques

The following is a recommended step-by-step method for troubleshooting problems in a NetWare 4.11 SMP environment.

Step 1: Native NetWare or SMP?

The first step is to determine whether the problem exists only in NetWare SMP or if it also occurs in "native" (non-SMP) NetWare. It is much easier to debug problems in the non-SMP NetWare environment. To proceed, comment out the lines that load the SMP-related modules in your server's STARTUP.NCF file. These include the following (add "REM" to the beginning of each line to comment it out):


REM load MPS14.PSM (or similar PSM specific to your serverhardware)

REM load SMP.NLM

REM load MPDRIVER.NLM ALL

load C:\NWSERVER\PK411.NLM

load AIC7770.HAM

...

Reboot the server and see if the problem reoccurs in the non-SMP environment. If so, go through the standard troubleshooting steps covered in the AppNote entitled "Resolving Critical Server Issues" in the February 1995 Novell Application Notes. If the problem only occurs in the SMP environment, continue with the next troubleshooting step.

Step 2: SMP Software or Hardware?

The next step is to ascertain whether the problem is with the SMP software itself or whether it is a hardware problem. To do this, use the parameter "0" (indicating processor 0) in the command to load MPDRIVER.NLM. In the STARTUP.NCF file, modify the MPDRIVER line as follows:


LOAD MPDRIVER.NLM 0

This parameter forces the MPDRIVER module to use only processor 0 instead of all the processors in the server. (Note that you must always use at least the parameter 0 with MPDRIVER.NLM because the kernel runs on processor 0 and it must be active in order for NetWare SMP to run.) After you reboot the server, you can determine whether the problem occurs because the SMP software is loaded on the machine.

If the problem does not reoccur, you may be experiencing hardware problems with one of the other processors, or with threads migrating to processors other than processor 0. If this is the case, skip to Step 4 below. If the problem persists, it has something to do with the SMP software itself, or how the SMP software is responding to the server hardware, or how the SMP software is functioning with other NLMs loaded on the server. Continue with the next step to further isolate the problem.

Step 3: SMP or Other Software?

If the problem does not occur in native NetWare, but does show up in NetWare SMP with only processor 0 activated, you are probably dealing with a problem either in the SMP software itself or with an NLM that is not properly using the SMP software. The next best step is to unload all the server modules except SMP.NLM--this includes the PSM module, the MPDRIVER module, DS.NLM, the LAN and DSK drivers, and any other modules (such as third-party applications and utilities) that are typically loaded on the server.

Next, begin manually reloading the NetWare NLMs and drivers one at a time until the offending NLM is found. If all of the modules that come with NetWare load properly, begin manually loading the third-party software that you had running on the server, one at a time, until the offending software is found.

If you experience an "Abend" (ABnormal END) error message while you are manually loading the NLMs, take careful note of the information that is displayed on the server console screen. This information can give you many good ideas about what the problem might be. Interpreting Abend error messages is documented in the AppNote entitled "IntranetWare Server Automated Abend Recovery" in the March 1997 Novell AppNotes. Novell recommends that you read this AppNote and follow the recommended procedures.

If you are experiencing another type of issue, such as memory not being released or high utilization on one processor, you will need to follow the troubleshooting procedures according to the type of issue it is. Further information can be found in the February 1995 AppNote referenced in Step 1.

If you are no closer to solving the problem after going through these steps, your final recourse is to call Novell (see "Calling Novell Technical Support" below).

Step 4: Isolating a Hardware Issue

Having SMP.NLM and the PSM and MPDRIVER modules loaded using only processor 0 will help you determine if the problem occurs because the SMP software is loaded on the server. If the problem only shows up with SMP loaded and multiple processors activated via MPDRIVER, there is a strong possibiltiy that your server is experiencing a hardware problem. If the problem does not occur with only processor 0 active, it indicates a possible problem with one of the other processors in the server.

The next step is to determine which processor is having problems loading and running the SMP software. The easiest way to understand this is through an example. Suppose you have a four-processor SMP server and you are using MPS14.PSM for the Multiple Processor Support module. The server is experiencing an Abend error that indicates a Page Fault Processor Exception. When you comment out the lines in the STARTUP.NCF file that load the SMP software (LOAD SMP.NLM, LOAD MPS14.PSM, LOAD MPDRIVER ALL) and reboot the server, the problem goes away. You then reload the SMP software, this time using the "0" parameter after MPDRIVER so that only processor 0 loads; again the problem does not reoccur. You then change the STARTUP.NCF file to include the following lines:


LOAD MPDRIVER 0

LOAD MPDRIVER 1

These commands activate two of the four processors (remember that processor 0 is always needed in order for NetWare SMP to run). After a reboot, the problem does not reoccur. You next change the STARTUP.NCF file to activate processor 0 and one other processor in the system, such as 2 or 3 in this example. The STARTUP.NCF file now reads:


LOAD MPDRIVER 0

LOAD MPDRIVER 3

This time, after you reboot the server, the Abend error occurs again. This indicates a hardware fault in processor 3; that processor should be replaced.

Calling Novell Technical Support

If your server is still experiencing problems after following all of these steps, a call to Novell Technical Support is warranted. Be prepared to give the support representative a precise summary of the troubleshooting steps you have taken, and please mention TID xxxxxxx (which is this NetNote in Technical Information Document format).

* Originally published in Novell AppNotes


Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.

© Micro Focus