Novell is now a part of Micro Focus

Troubleshooting Server Problems Using the ABEND.LOG File and Memory Images (Core Dumps)

Articles and Tips: article

BRAD DAYLEY
Worldwide Support Engineer
Novell Customer Services

01 Oct 1997


Besides its auto recovery process when Abends or lockups occur, IntranetWare writes a helpful summary of the state of the server to an ABEND.LOG file. If you can't find the problem from that, you can send a memory image to Novell for further analysis.

Introduction

With the release of IntranetWare, Novell added some advanced features to the NetWare 4.11 operating system to handle critical server issues such as "Abends" (ABnormal ENDs of execution) and lockups due to errant NLMs (NetWare Loadable Modules). In addition to the automated Abend recovery capability, IntranetWare servers create a log file whenever the server encounters a critical situation. This log file is named ABEND.LOG.

The ABEND.LOG file records useful troubleshooting information such as the file server name, the date and time of the Abend, the exact Abend message, the contents of the registers, and so on. This summary of the state of the server at the time of the Abend helps you more readily identify and isolate the cause of the problem.

This AppNote describes how to use the ABEND.LOG file in troubleshooting problems that occur at the server. For those who have NetWare 4.10 and NetWare 3.12 servers, the second part of the AppNote explains how to manually create an ABEND.LOG file using the NetWare internal debugger. The last section describes how to obtain a server memory image (core dump) that can be sent to Novell for analysis.

For additional information, refer to the following sources:

  • "IntranetWare Server Automated Abend Recovery", Novell AppNotes, March 1997, p. 32

  • "Resolving Critical Server Issues", Novell AppNotes, February 1995, p. 35

  • "Abend Recovery Techniques for NetWare 3 and 4 Servers", Novell AppNotes, June 1995, p. 75

IntranetWare Auto Recovery Process

To understand the auto recovery process, it is important to understand how an IntranetWare (NetWare 4.11) server behaves as opposed to a NetWare 4.10 or 3.12 server. In NetWare 4.10 or 3.12, if the server experiences an Abend, the server halts all operations and displays an error message on the console screen. At the end of the message is a prompt to either copy a diagnostic image (or "core dump") of the server's memory to disk, or exit to DOS.

IntranetWare has an improved recovery mechanism that will, in most circumstances, suspend only the process responsible for the Abend, while continuing the execution of other programs. This is known as the auto recovery process.

The ABEND.LOG File

As part of the auto recovery process, the IntranetWare operating system also creates a summary of the state of the server at Abend time. This summary is written to a file named ABEND.LOG on the C partition and is later appended to a file with the same name in the SYS:SYSTEM directory. (The ABEND.LOG file can be deleted or reset if necessary to conserve disk space.)

Figure 1 shows the kinds of data recorded in the ABEND.LOG file which is created when the auto recovery process is activated.

Figure 1: Example of an ABEND.LOG file.

Troubleshooting with the ABEND.LOG File

The ABEND.LOG file contains the following information:

  • File server name

  • Date and time of the Abend

  • Text of the Abend message

  • Contents of the registers on the server

  • Name of the module or process that was suspended

  • Name of the process that was running at the time of the Abend

  • Stack pointer and limit

  • Stack trace

  • Modules list

This information can be useful in identifying and isolating the cause of server Abends, as described in the following sections.

Server Name

The Server Name is the first piece of data saved in the ABEND.LOG file. The name of the server can be useful if you have multiple servers that are experiencing an Abend. In such a scenario, it is important to keep track of which servers experience the problem and which do not. Many times a simple comparison between a server that is experiencing an Abend and one that is not will provide valuable clues as to the cause of the Abend. For example, suppose you have two identical servers, one of which experiences an Abend while the other one does not. The only difference between the servers is that the one that Abends is used for backups. In this case the logical component to focus your troubleshooting efforts on is the backup software and hardware.

Date and Time of Abend

Because of the auto recovery process in IntranetWare, administrators may not even notice that a server has experienced an Abend, even when the error has occurred several times. Looking at the ABEND.LOG file may be the only way to tell what has happened to the server. Many times a pattern in the date or time of the Abend will provide an important clue to the cause of the Abend. For example, suppose you notice that a particular server has experienced the same Abend five times, and every time the Abend occurred at midnight. The logical starting place in this case would be processes that run at midnight, such as compression or backups.

Abend Message

The Abend message itself can provide one of the biggest hints as to the cause of an Abend. Earlier versions of NetWare produced generic messages such as "Page Fault Processor Exception". This describes an entire class of errors that can be caused by many different things. The ABEND.LOG file contains more specific Abend messages such as "Free detected modified memory beyond the end of the cell being returned". This particular message indicates that the running process tried to free memory that has been overwritten.

Often a combination of this type of Abend message, the running process, and the stack trace will prove sufficient to enable you to find a solution from the Novell Support Connection web site or CD-ROM.


Note: Refer to the "Resolving Critical Server Issues" AppNote (Feb. 1995, p. 35) for amore detailed description of the different types of Abends.

Registers

The registers become important when the above information is not sufficient to identify the possible cause of an Abend or at least a starting point for troubleshooting. It is most helpful if you can establish some sort of pattern from the contents of the registers in several Abends. If the contents of the registers (especially the EIP register) are the same each time, it usually means the operating system followed the same a code path to arrive at the Abend. This indicates a software bug or possibly a corrupt file. Many resolutions to such problems are available from the Novell Support Connection web site or CD-ROM. Search for the running process and the NLM that triggered the Abend.

ABENDed NLM

The "ABENDed NLM" is the module that owns the code that actually triggered the Abend on the server. However, this does not necessarily mean that this module caused the Abend. A function in the named NLM may have been passed a bad value from another module. Knowing what module was active when the server experienced the Abend is useful to identify which NLM or third-party product to troubleshoot. It is also very helpful in looking up known resolutions from the Novell Support Connection. The description of the resolutions usually includes the name of the module and often the running process, the Abend message, register values, and functions from the stack trace.

Running Process

If the running process is the SERVER process, the Abend could literally be caused by anything. Any module can make a request to the NetWare operating system, and the result of that request will show up as a SERVER process. If the running process is something other than the SERVER process, knowing the name of the proces is usually quite helpful. Often the process belongs to the module that is causing the Abend. Almost always it is related to the Abend in some fashion. As an example of how to use this piece of information, suppose your server Abends and the running process is identified as TCPIP. A logical troubleshooting step would be to try to obtain a protocol analyzer trace during such an Abend and examine the TCP/IP packets that are being sent to the server.

Stack Limit and Stack Pointer

The stack limit and stack pointer are used to determine if severe memory corruption has taken place. The stack limit is simply the smallest value the stack of the running process can reach. If the stack pointer is at a value lower than the stack limit, or if it is greater than the stack limit plus the stack size (3000 for SERVER processes in IntranetWare 4.11; other processes can have varying stack sizes), then the current stack pointer (ESP) is incorrect and the server has experienced memory corruption. In this case, make certain the following SET parameters are configured as shown:

SET ALLOW INVALID POINTERS = OFF

SET READ FAULT EMULATION = OFF

SET WRITE FAULT EMULATION = OFF

If these SET parameters are set correctly, there is a good chance that the memory itself is corrupt (RAM, internal cache, external cache, and so on) or that another hardware problem exists.

Stack Trace

The stack trace is a printout of the contents of the stack, one value at a time. If the value is an address located inside a module, the module name, function(if available) and offset are printed to the right side of the address. The stack trace values that are of concern are those that fall inside of a module. Once again, a pattern in the stack trace of several Abends will provide clues as to the possible cause. If the pattern shows that the stack trace of several Abends is exactly the same, there is likely a code path to the Abend and there is a good chance that a fix is already available. Search the Novell Support Connection under the ABENDed NLM name, the running process, and the function (if available) to find possible resolutions.

Modules List

Many problems that initially existed in the IntranetWare 4.11 operating system and in the Novell products that run on that platform have already been fixed by Novell. These fixes are readily available from Novell. The first thing to do if your server is experiencing Abends is to look at the modules list and compare it to a list of the latest released versions from Novell. If you see modules that are outdated, update them--especially if one of them is the ABENDed NLM. The modules list is complete with version numbers and dates to make it is easy to tell which revisions were running on the server when it experienced the Abend.

Creating an Abend Log File in NetWare 4.10 and 3.12

We have described the usefulness of the ABEND.LOG provided by the IntranetWare operating system in tracking down and troubleshooting server problems. Due to the architectures of NetWare 4.10 and 3.12, these earlier operating systems do not have this same capability. However, you can create a similar log file in these environments by using the internal debugger provided with NetWare.

When a NetWare 4.10 or 3.12 server Abends, you can choose to copy a diagnostic image of the server's memory to disk, or exit to DOS. Copying the memory image is referred to as obtaining a "core dump". Novell can analyze a core dump, which often provides enough information to identify the cause of the Abend. However, there are two problems with the core dump method of troubleshooting:

  • First, a core dump image is the same size as the amount of memory in the server. On a server with 512MB of RAM, for example, the core dump will result in a 512MB image file, which can be difficult to handle.

  • Second, a great deal of troubleshooting can be done prior to sending a core dump to Novell. Most Abend problems will have already been fixed and patches made available. By reviewing an Abend log file and then visiting Novell's technical support web site, you can often find a resolution without ever having to contact Novell.

For these two reasons, this section describes the simple steps to manually create a log file in NetWare 4.10 and 3.12 that is similar to the ABEND.LOG file in IntranetWare. The basic steps are as follows:

  1. Enter the NetWare internal debugger, and record the information displayed when you issue the commands given in Steps 2 through 7.

  2. Issue a "v" command.

  3. Issue an "r" command.

  4. Issue a "?" command.

  5. Issue a ".r" command.

  6. Issue a "dds" command.

  7. Issue a ".m" command.

The results can be recorded on a worksheet similar to the one shown on the next page.

Step 1: Enter the Debugger

Entering the internal NetWare debugger is simple. While at the Abend screen, and prior to obtaining a core dump or exiting to DOS, press and hold down the following keys:

<Left-Shift< + <Right-Shift< + <Alt< + <Esc<

This will take you to the debugger screen, which contains the # prompt at which you can enter commands into the debugger. The NetWare operating system is halted at this point.

It is not recommended that you enter the debugger except at Abend time. Enteringthe debugger on a live server effectively halts all operations on the server.

Once you are in the debugger, you can begin issuing the commands listed above to obtain the information to fill out the ABEND Log File worksheet.

Step 2: Issue a "v" Command

Typing "v <Enter<" at the debugger prompt will allow you to scroll through the screens of the debugger one at a time. Press <Enter< to move from screen to screen. One of the screens will be the one displaying the Abend. As shown in Figure 2, you can fill out the server name, date, time and Abend message fields from this screen.

Figure 2: The results of the "v" debugger command.

Step 3: Issue an "r" Command

Issuing an "r" command at the debugger prompt displays the current register values. As shown in Figure 3, the registers portion of the Abend Log File worksheet can be filled out from this information.

Figure 3: The results of the ".r" debugger command.

Step 4: Issue a "?" Command

Issuing a "?" command at the debugger prompt displays information about the current instruction's address, including the module name, function(if available), and offset. From this display, you can fill out the EIP portion of the Abend Log File worksheet.

Figure 4: The results of the "?" debugger command.

Step 5: Issue a ".r" Command

Issuing a ".r" command at the debugger prompt displays information about the current running process, including the process name, the stack pointer, and the stack limit. The running process, stack pointer, and stack limit portions of the Abend Log File worksheet can now be filled in (see Figure 5).

Figure 5: The results of the ".r" debugger command.

Step 6: Issue a "dds" Command

Issuing a "dds" command at the debugger prompt displays a dump of the current stack, including module names, function names(if available), and offsets. As shown in Figure 6, you can now fill in the stack trace portion of the Abend Log File worksheet.

Figure 6: The results of the "dds" debugger command.

Step 7: Issue a ".m" Command

Issuing an .m command at the debugger prompt displays a information about the currently loaded modules including module name, version and date. The loaded modules portion of the ABEND log file sheet can now be filled in.

Figure 7: The results of the ".m" debugger command.

Exiting the Debugger

To exit the debugger and return to the main server console screen, type "g <Enter<" at the # debugger prompt.

When you are finished, you should have a completed Abend Log File worksheet. You can use this information in the same way as the ABEND.LOG file described earlier in this AppNote to identify the possible causes of server Abends.

Obtaining a NetWare Server Memory Image (Core Dump)

The term "core dump" comes from the mainframe world, where RAM memory was (and still is) referred to as core memory because of the way data was stored in ferrous magnetic cores--little doughnut-shaped objects made out of ferrous (iron-based) material. Today's microcomputers don't store data in this manner, but a PC's system RAM is still occasionally referred to as the core.

A memory image or core dump is a byte-for-byte image of a NetWare server's memory--a "snapshot" of a server's RAM at the time it abended. The terms "core dump" and "memory image" are interchangeable; for consistency's sake, we use "memory image" in this AppNote.

This section discusses the conditions under which Novell may request a core dump, and the path and method you should use to copy the memory image. It also describes how to prepare the memory image to send to Novell, and what you can expect in terms of Novell technical support.

When a Core Dump Is Requested

Novell may request a core dump if a customer is experiencing a lockup or Abend condition and other troubleshooting has failed to come up with a resolution. The core dumps can be analyzed by Novell engineers and are sometimes key to finding bugs.

Three Ways to Initiate a Memory Image Copy

On a NetWare server, a memory image copy can be initiated in one of three ways:

  1. By answering the prompts generated by NetWare after a server Abend has occurred (see the sample Abend screen depicted earlier in this AppNote).

  2. You might need to force a memory image copy under the following circumstances:

    • The server encounters an error (such as a server hang or lockup) and you are not given the option to perform the memory image copy.

    • A server may exhibit strange behavior but not display any errors, and a Novell Service Representative may request the memory image file to see some internal details.

    Here are the steps to follow to force a memory image copy:

    1. If the server is running, press the following keys simultaneously to enter the debugger: <Left-Shift< + <Right-Shift< + <Alt< + <Esc<.

    2. At the debugger prompt ( # ), type ".c <Enter<" to start the diagnostic image copy.

    3. When the copy is finished, type "g <Enter<" to exit the debugger and enable server operation to continue (provided the server is not locked up). Otherwise, type "q <Enter<" to quit to DOS.

  3. If the server's keyboard does not respond, you can generate an Abend by causing the CPU to issue an NMI exception, using an approved method from the PC hardware vendor.

Next, you need to choose the path and the method you'll use to copy the memory image file to a storage medium.

Choosing the Path to Copy To

With NetWare 3.12 and later (including all versions of NetWare 4), the user can specify the drive letter to which the memory image file will be copied. This drive can be any writeable DOS device, even a network drive on another file server that was mapped under DOS prior to booting the server. The size of the image file will be approximately equal to the total RAM installed in the server.


Note: For NetWare 3.11, a NetWare Loadable Module called HDUMP.NLM is availablewhich allows you to write the image file to a local DOS partition or network driveinstead of to a floppy drive. This file is available in the TABND2a.EXE file availablefrom http://support.novell.com.

Choosing the Copy Method

Four methods are available for copying the memory image file to a storage medium: floppy drive, hard drive, network drive, and other drive.

Floppy Drive Method. If the image is copied to a floppy drive, the user will be prompted to insert formatted diskettes. Be sure you have sufficient diskettes on hand to copy all of your machine's RAM. For example, to copy 12MB of RAM, you'd need nine 3 inch high-density (1.44MB) diskettes.

This method is not usually a good one to use for two reasons. First, bad sectors on a floppy diskette could cause the image to be invalid or unusable when being analyzed. Second, copying to diskettes on systems with large memory will take several hours, during which time the server is unavailable to users. This was the only method available when NetWare 3.11 shipped, but it has become obsolete as the other methods have become available.


Note: This method is no longer available in IntranetWare 4.11 because architecturalchanges to server memory have made a core dump from floppies unreadable. Thecurrently shipping version of DIAG411.NLM inadvertently prompted the user todump memory to floppy. This option will be removed in later versions.

Hard Drive Method. When the image is copied to a local hard drive on the server, the name of the image file is COREDUMP.IMG by default. Once the image file is on the hard disk, it can be compressed, copied to diskettes, backed up to tape, or sent by FTP to ftp.novell.com (see "Sending the Image File to Novell" for details).

The image file can also be copied to a NetWare drive later, after the server is up and running. This can be done by using a NetWare Loadable Module called IMGCOPY.NLM or any other third-party NLM that provides this functionality.


Note: The file IMGCOPY.NLM is included in the self-extracting file TABND2a.EXE. Formore information about using this module, refer to the readme file included withthe download.

Network Drive Method. If this method is used, some advance setup is required prior to the Abend or hang. Specifically, the problem server must have an extra network adapter installed, and a client ODI driver must be loaded for this adapter. (This is possible as long as you load the client driver in DOS conventional memory. The server drivers load in extended memory, so both types of drivers can be loaded at the same time.)

You'll also need to obtain and load a NetWare Loadable Module called NETALIVE.NLM, which can be found in the self extracting file TABND2a.EXE. This module keeps a client connection alive underneath an active server when two network adapters are used. (For more information on using this module, refer to the readme file that comes with TABND2a.EXE.)

When an Abend occurs, proceed as follows:

  1. Boot the problem server as a client (when using VLMs, you'll need to make one small change to the config.sys file: that is, set LASTDRIVE=Z).

  2. From this "client," log in to a healthy server elsewhere on the network.

  3. Map a drive to a volume and directory on the healthy server. The volume must have enough free disk space to copy the problem server's memory image file. Record the complete path (for example, f:\sys:\cdump).

  4. Now boot the client as a server by running SERVER.EXE from the DOS partition or boot diskette. The server comes up, but DOS is still loaded. Therefore, until NetWare's watchdog function terminates the connection to the extra network adapter, a connection to the healthy server can be maintained via NETALIVE.NLM.

  5. If you are running NetWare 3.11, load HDUMP.NLM with the recorded path on the command line. For example:

    load hdump f:\sys:\cdump <Enter<

    With versions of NetWare later than 3.11, the Abend screen gives you the opportunity to specify a different path. This is where you would specify the path you recorded in Step 3 (for example, f:\sys:\cdump).

  6. By default, the name of the image file is COREDUMP.IMG. Once this file is copied to the specified drive on the other server, it can be renamed, compressed, copied to diskettes, backed up to tape, or sent via FTP to Novell (see the instructions under "Sending the Image File to Novell").

This method can speed up the memory image copying process by as much as four to five times over the other outlined methods. By way of comparison, for a server with 128 MB of RAM, it can take 5 to 6 hours to copy the memory image file to diskettes.

The network drive method has been tested in Novell's server lab and has been used by several customers to obtain valid memory image files.

Other Methods. Any device that can appear as a logical drive to DOS and that contains enough storage space to contain the entire image can be used to obtain a core dump. Examples are ZIP drives and JAZ drives from Iomega, CDRs (recordable compact discs), optical disks, and so on. These devices require that a device driver be loaded in the CONFIG.SYS file. Attach the device, set up the drive letter, and then boot the server. As long as the REMOVE DOS command is not issued, DOS thinks the drive is still there.

When a server experiences an Abend, copy the memory image to the drive letter of the device. The drive can then be moved onto a DOS PC, from which the image file can be compressed and sent to Novell.

Sending the Image File to Novell

To send a memory image file to Novell, an open support incident number is required. To obtain this number, work through your Novell Support Representative, or call 1-800-NETWARE and open a support incident.


Note: The incident is billable, and you will need to provide a credit card to open theincident. You will not be charged until the incident is resolved or closed. If it turnsout that the problem is a NetWare bug and no patches were previously available,there will be no charge.

A Customer Support Representative will assign you a Technical Support Engineer who will help you analyze the memory image file. He or she will make arrangements to receive the image either in the mail or through the Internet.

Before sending the image file, rename it with the first eight numbers of the incident number assigned to you. This will help Novell process the image file. Also, consult with your Technical Support Engineer to determine the best media format to use. Novell does not return floppy diskettes or backup tapes sent in with memory image files.

To Send an Image by Mail. To send the image file by mail, zip the file first and copy it to the agreed-upon media. Mail to:

Novell Support Attention: Technical Support Engineer's Name Mail Stop E-34-2 Novell Inc. 122 East 1700 South Provo, Utah 84606 U.S.A.

To Send an Image via the Internet. Customers with access to the Internet can send the memory image via anonymous FTP to ftp.novell.com/incoming/. The file should be zipped first and then placed in the "incoming" directory. This method can save both parties time and money.

If you use this option, be sure to make arrangements with the Technical Support Engineer who will be receiving the file. Again, an open support incident is required. Files received on the Internet with no open support incident number will be deleted.

Conclusion

The robust auto recovery mechanism in IntranetWare has greatly simplified the handling of most server Abends and lockups. The information written to the ABEND.LOG file can assist in identifying the cause of the Abend. You can manually create a similar log file for NetWare 4.10 and 3.12 servers using the procedure outlined in this AppNote.

Occasionally, you may need to create a core dump for assistance from Novell Support. Obtaining a memory image is a fairly simple procedure, and this AppNote has described several methods you can use to do it. Novell also provides several methods for you to send your memory image for analysis by Novell Worldwide Support.

* Originally published in Novell AppNotes


Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.

© Copyright Micro Focus or one of its affiliates