NLMDebug Overview

Articles and Tips: article

CLIB Development Team
Novell, Inc.

01 Apr 1996

This article offers an overview of NLMDebug, which was originally created to debug CLIB.NLM. The NLMDebug tool is now in beta format, with interfaces still in flux. Once loaded, NLMDebug uses hooks tightly integrated with CLIB.NLM to watch and report on other NLMs running on the server. Among the topics discussed in this article are the following: debug settings, CLIB context, NCP debugger, and CLIB remote connections.

Introduction
Debug Settings
CLIB Context
NCP Debugger
CLIB Remote Connections
File Opens
Process Timer

Introduction

NLMDebug is a tool that was originally created by Novell's API development team to debug CLIB.NLM. As new features were added, it became more powerful and more popular with Novell's internal engineers. Eventually, the development team decided to make this tool available to NLM developers everywhere.

NLMDebug is currently in a beta format and the interfaces are still in flux, a fact that makes user documentation challenging to write. This article is designed to be more of an overview of the features available with NLMDebug and less of a user manual which might outline step-by-step instructions on performing specific tasks. Eventually, these step-by-step instructions will be incorporated in the on-line SDK documentation.

The name "NLMDebug" might be a little misleading as to the nature of this tool, the reason being that traditional debuggers have features that NLMDebug does not possess. For instance, unlike NLMDebug most debuggers can "decompile" code, set break points and step through code one instruction at a time. This tool might be described more accurately as an NLM "watcher" or "profiler." Once loaded, NLMDebug uses hooks that are tightly integrated in CLIB.NLM to watch and report on other NLMs running on that server. NLMDebug can only be used with CLIB.NLM version 4.11 (or later).

The NLMDebug main menu (shown in the figure) displays the functions provided by NLMDebug and also provides the structure for the remainder of this article.

Debug Settings

As the following figure illustrates, NLMDebug can be configured to scan for a wide variety of errors.

This tool scans all the CLIB-based NLMs running on the server and displays error information based on the configuration settings. These settings are shown in the following table.

Setting	Explanation
Resource Checking	(Yes/ NO) Display a warning message and a stacktrace each time an NLM fails to free allocatedmemory before exiting.
Memory Checking	(Yes/ NO) Monitor memory for overwrites.
Memory Tagging	(Yes/ NO) Tag memory that has been allocatedor freed. This is done by writing a signaturewhich is similar to the Process ID numberinto the memory location.
Semaphore Checking	(Yes/ NO) Display a warning message and a stacktrace each time an NLM fails to free a semaphore.
Semaphore Monitoring	(Yes/ NO) Check for semaphore mismanagement, such as deadlocks.
ReportThread Errors	(Yes/ NO) Watch for illegal thread management.See SetThreadGroupID().
ReportNo CLib Context	(Yes/ NO) Display a warning message every timea function call (that requires context) ismade to CLIB without context.
Ringthe Bell on Error	(Yes/ NO) Ring the bell when one of the errorsdescribed above is detected.
Auto-SaveAll Settings	(Yes/ NO) Save all NLMDebug settings. Note:These settings are saved to disk each timeNLMDebug is unloaded. Keep in mind thesesettings will essentially establish defaultbehavior in the way the server librariesbehave whenever NLMDebug is loaded. For instance,if NLMDebug was configured to find memoryleaks the last time it was loaded, it willcontinue reporting stack traces wheneveran NLM is unloaded with outstanding memory. Original default settings can be reestablishedtemporarily by unloading all the server libraries(CLIB.NLM, NIT.NLM, NLMLIB.NLM, REQUESTR.NLMand THREADS.NLM) and reloading them withoutloading NLMDebug. To delete the settingscompletely, remove the file SYS:\SYSTEM\NLMDEBUG.CFG.
LogOutput to Path	Writethe error information to the file specifiedby this path statement. If blank, the informationwill only be displayed on the screen. Thisfile can be cleared by either renaming thefile, deleting the file, or unloading andreloading NLMDebug.

CLIB Context

The CLIB Context feature of NLMDebug is a cross between an instructional tool and an NLM "profiler." This feature can profile an NLM, meaning that a developer can access a great deal of information, such as thread, context and connection information, without looking at a single line of code. This feature also does an excellent job of illustrating CLIB context because each piece of information is presented hierarchically with respect to its CLIB context level.

The following diagram shows all of the information available with CLIB Context. The information is separated into 3 different screens, each corresponding to a particular context level as the titles suggest (NLM-Level Context, Thread-Group-Level Context, Thread-Level Context).

Note: The figure above and some of the following descriptions use DEMO.NLM as an example.

The NLM-Level Context window displays information that is global to every thread in a particular NLM. This information is explained in the following table.

Option	Explanation
Default Thread Name	Thedefault thread name for the NLM, in thiscase "demo nlm 1".The number at the end of the name is thevalue for the next thread to be created.
NLMID	Theunique ID for this NLM, in this case f91aa930.This is the value that would be returned by GetNLMID().
ThreadGroups	Adecimal value indicating the number of threadgroups in this NLM. The diagram shows thatDEMO.NLM has one thread group. The numberof thread groups is limited only by the amountof available memory. Press <Enter<(twice) to bring up the Thread-Group-LevelContext screen.
atexitRoutines	Adecimal value indicating the number of functionsregistered to be called when the NLM terminatesnormally. DEMO.NLM has no such (atexit) functions.
AtUnloadFunction	Ahexadecimal value showing the address ofthe AtUnload function, if one has been registered.The figure shows that DEMO.NLM has no AtUnloadfunctions.
SynchronizedProcess ID	Ahandle to a PCB. This PCB identifies a processwhich has been temporarily suspended untilSynchronizeStart() is called.
NLMSecurity Level	Adecimal value indicating the NCP securitylevel for this NLM. Levels can be 0, 1, 2,or 3. The default level for CLIB is set onthe server. The NLM Security Level pertainsto packet signing.
NLMChecksum Level	Adecimal value describing the checksum level.Levels can be 0, 1, or 2. The NLM ChecksumLevel pertains to packet signing.
PacketBurst Buffers	Thenumber (0 to 9) of ECBs waiting for incomingpacket burst packets. A zero value indicatesthat packet burst is not enabled.
NCPRetries	Thenumber of retries before an NCP times out.The default is 6.
LocalConnections	Thenumber of connections the NLM has to thelocal server.
HandicappedYields	Aflag (ON / OFF) describing if the NLM isusing CPU-yielding functions. If the flagis ON, ThreadSwitch() behaves like ThreadSwitchWithDelay().
SMPEnabled	For"Yes" to be returned, the NLM mustbe capable of running on multiple processorsand SMP.NLM must be loaded. Otherwise, "No"is returned.

The Thread-Group-Level Context window displays information that applies to every thread in a particular thread group. This information is explained in the following table.

Option	Explanation
Threads	Adecimal value indicating the number of threadsin this particular thread group. Press Enter(twice) to bring up the Thread-Level Contextscreen.
CurrentConn ID	Adecimal number representing the current fileserver ID (0 = local server). See GetFileServerID()and SetFileServerID() in the NWSDK documentation.
CurrentConn	Adecimal number representing the current connectionnumber as it would be seen on MONITOR.NLM.
CurrentTask	Thecurrent task number (hexadecimal value). Task numbers are unique and represent programsrunning on a network workstations or servers.
CurrentVolume	Thecurrent volume number for this thread group.This is a decimal number than can range invalue from 0-63 (for NetWare 3.1 and later).
CurrentDirectory #	Thecurrent directory number for this threadgroup. This the same number that would bereturned from a call to FEMapHandleToVolumeAndDirectory()on a local server. On a remote server thisis a short directory handle.
CurrentName Space	Thecurrent name space for all the threads inthis group. Name space values can be 0 (DOS),1 (Macintosh), 2 (NFS), 3 (FTAM), 4 (OS/2).
CurrentTarget Name Space	Thecurrent target name space number forall the threads in this group. Name spacevalues can be 0 (DOS), 1 (Macintosh), 2 (NFS),3 (FTAM), 4 (OS/2).

The Thread-Level Context window displays information that applies only to individual threads. This information includes the following:

Option	Explanation
PCB	Thethread's Process Control Block handle.
ThreadStack	Theaddress of the thread's stack limit, a hexadecimalvalue.
errno	Thelast reported errno value for this thread.
NWErrno	Thelast reported NWErrno value for this thread.NWErrno values are NetWare specific.
t_errno	Thelast reported TLI errno value for this thread.TLI errno values are specific to TransportLayer Interface functions.
ThreadSuspended	(Yes/ NO) The "thread suspended" state.See SuspendThread() and ResumeThread() inthe NWSDK documentation.
SuspendCount	The"thread suspended" count state.See SuspendThread() and ResumeThread() inthe NWSDK documentation.
CriticalSection Count	Thenumber of times a thread has entered a criticalsection (thus suspending all other threadsin that NLM). The maximum value for CriticalSection Count is 4 billion. See EnterCritSec()and ExitCritSec() in the NWSDK documentation.

NCP Debugger

Briefly, NCPs (or NCP packets) comprise the fundamental language that NetWare speaks. Each NCP packet contains a service request and a header that specifies the source and destination of the packet. These packets are transmitted between workstations and servers (or just between servers) over the IPX communications protocol.

The NCP Debugger, a tool for studying and debugging NCP packets, offers two powerful features. First, the debugger allows the user to define the scope of the packet information to be scanned. On one extreme, the debugger can break on every packet that leaves a specific server.

With a small modification, the debugger can narrow its scan to break on specific packets according to their target servers, connection numbers, function codes and subfunction codes. The NCP Debugger can also be configured to break only when it receives an error condition.

The second powerful feature of the NCP Debugger is the stack trace. Once the debugger halts on a packet, the stack trace shows every call that lead to the current position. The stack trace is presented in reverse order, the most recently called function at the top and the first function called (usually main) at the bottom.

Selecting NCP Debugger from the main menu will bring up a screen like the one shown in the following figure. The NCP Debugger can be configured with the setup menu which appears in the center of the screen.

As shown in the figure, the NCP Debugger Setup Menu contains the options shown in the following table.

Option	Explanation
RemoteFile Server	Scanall of the packets going to this server.If blank, scan all outgoing packets.
Breakon Connection Number	Breakon any packet going to the specified connectionon the server listed in the previous field.If blank, ignore connection numbers.
Breakon Function Code	Breakon any packet with this function code goingto the specific connection on the file server.If zero, ignore function and sub-functioncodes.
Breakon Sub-Function Code	Breakon any packet with this sub-function codegoing to the specific connection on the fileserver. If zero, ignore sub-function codes.
Delaybetween Packets	Timedelay between packets, in seconds.
Breakon Error	(Yes/ No) Break when an error is returned.
Break on Every Packet	(Yes/ No) Break on every packet.

The setup menu in the previous figure is set to display every packet sent to server "LYNX," with a two-second delay between packets. The debugger is also set to break on any errors. With this configuration, a new packet is displayed every two seconds until one finally returns an error. The following figure shows a packet display which has been halted due to a completion code of 252. Notice the Press SPACEBAR to continue message on the bottom of the screen. Pressing the spacebar will clear the current breakpoint and cause the debugger to resume sending packets.

It's important to note that not all error codes mean failure. Some error codes are used to return configuration or status information.

The top portion of the screen displays information about the outgoing packet. The title box shows the function and sub-function numbers (23 0), the remote server (LYNX), the signing state (off) and the size of the packet in bytes (51).

The bottom portion of the screen displays information about the reply packet. This title box shows the completion code (252), the packet sequence number, and the size in bytes (38).

The Page-Up and Page-Down keys alternate between the top and bottom screens. Further detailed information can be obtained through the following function keys.

F2 Display a stack trace showing the position of the function whose packet is currently displayed on the screen. The stack trace, as seen in the figure, can be a useful tool in determining how a certain error was reached. Notice that the most recent function is at the top of the list and main(), the first function called, is on the bottom of the list.
F3 Display NCP packet header information.
F4 Display the NCP Debugger setup menu.
F5 Display an NCP error report showing the NCP return code, the connection status and the reply ECB status.

CLIB Remote Connections

NLMDebug provides detailed information about every CLIB-based remote connection on a particular server. This function may be especially useful when debugging NLMs that make connections to remote servers. Selecting CLib Remote Connections from the main menu will display a summarized list of each connection, as shown in the following figure.

The left side of the diagram contains a list of server names corresponding to each remote connection. The columns to the right show the number of connections to each server.

The connections are categorized by their connection state; either "Login," "Attach," or "Cached." The total number of connections is displayed on the far right of the screen. CLIB stores the names of the servers in a session list, even after the connection is destroyed. Therefore, a server name might appear on this screen with zero connections.

Briefly, connections that are "logged in" have presented a name and password and have been accepted by the remote server. These connections are licensed and therefore decrement the available connection count on the remote server. Connections that are "attached" (and not logged in) don't use licenses, but they also don't have the same privileges as a licensed connection. When a server "attaches" to a remote server and then destroys the connection without logging in, CLIB stores the connection information in memory for future use. This is known as a "cached" connection.

The following figure shows some of the information available on each connection. The screen titled "Remote Connection Information" is provided with a scroll bar to view all of the connection information. This screen includes the following fields.

Field	Explanation
Slot	Theremote connection number (there can be severalconnections, or slots, to a single ConnectionID).
Status	Thecurrent connection status; ATTACHED, LOGGED IN, or CACHED.
Server	Thename of the remote server.
Net	TheIPX address of the remote server. This addressconsists of three separate values; the netaddress, the node address and the socketnumber.
NLMID	Theunique ID showing which NLM owns this connection.
CurrentTask	Thecurrent task number.
user	Thehexadecimal user ID of the user that ownsthis connection (see NWDSGetCurrentUser()and NWDSSetCurrentUser() ). The user ID isinitially set with NWDSLogin().
Retries	Thenumber of retries before timing out.
NCSRetries	Theinitial number of retries before timing out.
NCSSecurityLevel	TheNLM Control Structure Security Level. Valuescan be 0, 1, 2, or 3.
NCSCheckSumLevel	TheNLM Control Structure Checksum Level. Valuescan be 0, 1, or 2.
AuthenticationState	Thecurrent authentication state (0=None, 1=Bindery,2=NDS).
LicensedState	Thecurrent license state (0=Not Licensed, 1=Licensed).
DSConnection	CurrentDS connection status (0 = CLIB Connection,1= DS Connection).

File Opens

NLMDebug is useful for tracking CLIB file open events that occur on either local or remote servers. This tool is similar in function to the File System Monitoring Services APIs, with one major difference. The File System Monitoring Services APIs can trap any file open event, while NLMDebug can only trap CLIB file open events. The File Opens function can be configured either of two ways. First, it can scan for a file opened by any CLIB NLM running on a particular server. Also, it can be narrowed to find file open events generated by one specific NLM.

The diagram shows the File Opens Setup Menu. In this example NLMDebug is configured to halt and display information on file opens originating from CONNTO.NLM.

The following fields are found in the File Opens Setup Menu.

Field	Explanation
HaltProcessing on Every CLIB open	(Yes/No)Halt on every CLIB open.
Halt& Show Info. This NLM opens	Showfile open information originating from thisNLM.

Once NLMDebug detects a CLIB file open, the originating NLM is halted and the file open information is displayed, as shown in the previous figure. This information includes the following:

Field	Explanation
ThreadID	Theunique ID of the thread that opened thisfile.
OpeningNLM	Thename of the NLM that opened this file.
OSHandle	TheOS file handle that corresponds to the openfile.
Positionin File	Thecurrent position in the file (number of bytesfrom the beginning).
AccessRights	Theaccess rights with which the file was opened.Press <Enter< to see a pop-up screenwith detailed information.
Connection	Theconnection number of the server that containsthe opened file.
Task	Thenumber of the task that opened the file.
DirectoryNumber	Theunique directory number that identifies theopen file.
VolumeNumber	Thenumber that corresponds to the volume containingthe opened file. For example, volume SYScorresponds to 0.
OpenCount	Thenumber of times the file has been opened.
DupCount	Thenumber of duplicated CLIB file handles tothe open file.
OpenType	Thetype of file opened, values can be any ofthe following:LOCAL FILE LOCAL QUEUEREMOTE FILE REMOTE QUEUEDOS FILE EXTENDED ATTRIBUTESTREAM FILE BSDSOCKET FILECONSOLE PRINT QUEUEASYNC IO REMOTE EA
TaskNumber Allocated	(Yes/ No) Was a new task number allocated beforeopening this file?
Openedstdin	(Yes/ No) Was the device STDIN opened?

Process Timer

The Process Timer function scans every process running on a server and displays those that exceed a specified time limit before yielding control of the CPU. The specified time limit (entered in milliseconds) is set by the user. Selecting Process Timer from the NLMDEBUG main menu brings up the NLM Process Timer screen, as shown in the following diagram.

This example shows a maximum time slice of 100 milliseconds, therefore every process that exceeds 100 milliseconds will be captured and displayed, along with a corresponding stack trace. Once the time limit has been entered, press <Esc< to bring up the NLM Process Stack Trace screen and begin scanning.

The NLM Process Trace screen provides the following information about each process that exceeds the set time limit:

Option	Explanation
Process	Theunique process ID of the process that exceededthe set time limit.
NLMName	Thename of the NLM that created the offendingprocess. In this example, the NLM is namedDEMO.NLM.
ThreadNumber	Theunique thread number corresponding to theoffending process. In this example the threadnumber is 0.
ProcessLength	Astatement showing the length (in milliseconds)of the process. In this example the processlasted 720 milliseconds before yielding.
StackTrace	Astack trace of the offending process.

* Originally published in Novell AppNotes

Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.