Hints for Handling Server Abends
Articles and Tips: tip
Worldwide Support Engineer
01 Mar 1998
For many network administrators, the first inclination when a NetWare server displays an Abend error message is to panic. But before you pick up the phone to call Novell Technical Support, you should gather as much information as you can about the Abend. A lot of helpful information was presented in the AppNote entitled "Troubleshooting Server Problems Using the ABEND.LOG File and Memory Images (Core Dumps)" in the October 1997 issue. This NetNote is an addendum to that AppNote.
The Abend Recovery Process
The term "Abend" is a shortened version of "Abnormal End," which indicates that the NetWare server has stopped its processing functions unexpectedly. An Abend can be caused by a bad code instruction or by a process that attempts to write to a part of memory that doesn't belong to its thread of execution. With NetWare 4.11, servers have some Abend recovery ability. You can choose to have the server suspend the malfunctioning threads and attempt to continue execution after recording information about the problem to the SYS:\SYSTEM\ABEND.LOG file. If you have the SET Auto Restart After Abend parameter set to 0 so the server doesn't automatically restart, you will see a conventional Abend message. As described in the October 1997 article, you can use a variety of means to gather information about what has happened to the server.
Below are some additional tips for handling server Abends.
Tip #1: Determine if the Abend is CPU- or Code-Detected
When a server Abends, the first step is to determine whether the error was detected by the CPU or by the operating system code. CPU-detected Abends always contain the words "Processor Exception" in the Abend message. After a Processor Exception error, the EIP (Extended Instruction Pointer) register points to the piece of code that was attempting to execute.
Tip #2: Information to Gather from Processor Exception Abends
The first piece of information you can gather from the Processor Exception error is the text of the Abend message itself. After you have written that down, go into NetWare's internal debugger by pressing and holding the <Left-Shift> + <Right-Shift> + <Alt> + <Esc> keys at the server console. (Do not enter the debugger except at Abend time; the debugger effectively halts all operations on a live server.)
Once in the debugger, proceed as follows:
Enter the ".a" command to see what the Abend message is.
Use the ".r" command to get the name of the running process that died.
Use the "?" command to find out what function the server was running when it stopped.
The above information will prove invaluable in debugging the problem. With this information, some Abends can be resolved right away.
Tip #3: Information to Gather from Code-Detected Abends
In the case of a code-detected Abend, the message is more of a sentence (for example, "Free Called With A Memory Block That Has A Null Resource Tag"). In this case, the EIP register is always 00000000 and cannot provide any information.You can still go into the debugger as described above and use the ".a" and ".r" commands to gather some of the information technical support will need. (The "?" command does not work after a code-detected Abend.)
As you work through server problems, it's important that you keep the NLM modules up to date on the server. For example, if the ".r" command shows "HTTP Worker Process" and the "?" command shows "CLIB.NLM at code start +F43D," be sure that both modules involved are the latest versions so that Novell Technical Support can properly troubleshoot the problem. The first support group to call would be the owner of the running process that died; in the case described above, you would call the Web Server Support group, because the HTTP Worker Process points to that module.
Tip #4: Call Before You Core Dump
If your server is running the latest modules from Novell and it Abends, you may want to take a "core dump" of your Abended server's memory to speed up the troubleshooting process. Call Novell Technical Support before you perform a core dump. It is possible that Novell is aware of the issue and is already testing a patch for the problem.
If you cannot get in touch with the Technical Support group in a timely manner, it's best to unload the offending module from your server in order to stabilize the machine environment until you can place a call.
* Originally published in Novell AppNotes
The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.