Troubleshooting NDS Problems
Articles and Tips:
01 Nov 1998
Editor's Note: "Technically Speaking" answers your technical questions, focusing on network management issues. To submit a question for a future column, please send an e-mail message to firstname.lastname@example.org, or send a fax to 1-801-228-4576.
Novell Directory Services (NDS) problems can range from minor irritations that eventually resolve themselves to significant corruption of the NDS database that only Novell Technical Services (NTS) can fix. This article focuses on NDS problems that fall somewhere between these two extremes. The types of NDS problems discussed in this article require you to do some troubleshooting but do not require you to know the inner workings of NDS in particular or client-server databases in general.
BACKING UP THE NDS DATABASE
You should be aware that troubleshooting and correcting problems in the NDS database could potentially lead to the loss of critical network resource information. To protect this information, you should make a verified backup of the NDS database before you attempt to troubleshoot NDS problems. (See "Technically Speaking: Backing Up NDS,"NetWare Connection, Aug. 1998, pp. 43-45. You can download this article from http://www.nwconnection.com/aug.98/techsp88.)
In addition, you should be familiar with Novell's NDS management utilities, such as the NetWare Administrator (NWADMIN) utility, the NDS Manager utility, the Partition Manager utility, and the DSREPAIR utility. You should also be familiar with the DSTRACE function at the server console.
DETERMINING THE NDS PROBLEM
In most cases, you must use the DSTRACE function at the server console to determine the type of NDS problem you are experiencing. The DSTRACE function returns an NDS error code, which indicates the type of NDS problem that exists on the network.
In this article, I will assume that you have already used the DSTRACE function to isolate a specific NDS error code. To learn more about using the DSTRACE function, you can read the following technical information documents (TIDs), which you can download free from Novell's Support Connection World-Wide Web site (http://support.novell.com):
"DSTRACE Commands, Filters, and Processes" (document number 2909026)
"Directory Services Trace Screen (DSTRACE)" (document number 2909019)
To locate these TIDs on Novell's Support Connection web site, select the Knowledgebase option, enter the document number, and click the Find button.
You can also read the "NDS Health Check" TID (document number 2921544), which offers detailed information about performing an NDS health check. Or, you can read "The Doctor Is In: Performing an NDS Health Check,"NetWare Connection, Dec. 1997/Jan. 1998, pp. 33-41. (You can download this article from http://www.nwconnection.com/dec.97-jan.98/ndshcd7.)
COMMON NDS ERROR CODES
Because this article cannot possibly explain how to resolve all NDS error codes, this article focuses on a few of the most common NDS error codes. If you want a more complete list of NDS error codes, you should read the "DS Error Codes" TID (document number 11299).
-601 Error Code
One of the most common NDS error codes is -601. Typically, the -601 error code indicates that a particular NDS object does not exist in a replica of an NDS partition. This problem may occur under the following circumstances:
The servers are running different versions of the DS NetWare Loadable Module (NLM).
When the NDS object was created, it was not fully replicated across all servers.
A communications problem exists between servers.
To troubleshoot this problem, you should first ensure that all servers in the NDS tree are running the latest version of the DS NLM for the version of NetWare. If these servers are running the latest version of the DS NLM or if you have only one server in the NDS tree, chances are that the NDS object was not fully replicated. In this case, you should complete the following steps:
Load the DSREPAIR utility by entering the following command at the server console:LOAD DSREPAIR
Select Advanced Options from the main menu, and then select the Repair Local Database option.
In the list of parameters that appears, leave all of the options at their default values except the value for the Check Local References parameter, which you want to change to Yes.
Run the repair process. If necessary, repeat this step until no errors are returned when the repair process is completed.
If completing these steps does not resolve the -601 error code, you should use the NWADMIN utility or the Partition Manager utility to delete the offending NDS object from the NDS tree.
The -601 error code often occurs together with other NDS error codes. If other NDS error codes do exist, you should resolve these error codes first because they can cause the -601 error code until they are resolved. For more information about the -601 error code, you should read the "601 Error in the DSTRACE Screens" TID (document number 2912901).
-625 Error Code
Another common NDS error code is the -625 error code, which indicates that a communications problem exists between servers. Specifically, the 625 error code is returned when one server in the NDS tree is unable to contact another server during the synchronization process. This communications problem may occur if a server has failed, if a network segment is down, if a network segment is saturated, or if IPX address conflicts exist.
If you receive a -625 error code, you should check to make sure that all servers in the NDS tree are physically capable of communicating with one another. In other words, can you attach to and map a drive to each server? Do the servers appear when you enter the NLIST SERVERS /A command while you are logged in to the network as the ADMIN user. (If the servers reside in different NDS contexts, you should enter the NLIST SERVERS /A /B command.) Finally, do the servers appear when you enter the DISPLAY SERVERS command at the server console?
If all of the servers in the NDS tree are physically capable of communicating with one another, a network segment may be saturated. Because saturation is sometimes caused by poor performance or malfunctioning drivers, you should ensure that you have installed the latest LAN drivers, WAN drivers, and support files on all servers in the NDS tree. You should then use a protocol analyzer to determine if a network segment is saturated.
Network saturation typically exists on slow WAN links. For example, you could have a slow WAN link between servers that require frequent NDS synchronization. If you find a slow WAN link on the network, you have two options: increase the speed of this link or redesign the NDS tree to reduce the amount of synchronization traffic that travels across the link.
The -625 error code may also be caused by server name or address conflicts in the NDS tree. That is, a server may have the same name as another server, a print server, or a Service Advertising Protocol (SAP) service device on the network. (SAP service devices include Windows NT servers or workstations and Windows 95 workstations on which file and print services are enabled.) A server may have the same internal IPX address as another server on the network, or an internal IPX address on a server may conflict with a LAN segment IPX address on the network.
Unfortunately, name and address conflicts are relatively difficult to resolve, generally requiring a great deal of manual intervention. To resolve name conflicts, you must check all NetWare servers, Windows NT servers and workstations, Windows 95 workstations, and print servers for their name information. To resolve address conflicts, you should ensure that each server has been assigned a unique internal IPX address and that these addresses do not conflict with the internal IPX addresses assigned to network segments.
Next, you should check the IPX addresses that have been assigned to each network interface board in each server. All servers that share a connected segment must have the same IPX address assigned to that segment. Each nonconnected segment must have a unique IPX address. Nonconnected segments attach to only one server.Connected segmentsattach to two or more servers (connecting these servers together). In Figure 1, segments 1 and 3 are nonconnected segments; segment 2 is a connected segment.
Figure 1: To avoid address conflicts, you must ensure that nonconnected segments have a unique IPX address.
To understand how address conflicts can occur, look at Figure 1. Segment 1 in Server 1 has an IPX address of 1, and segment 2 in Server 1 has an IPX address of 2. Because segment 2 in Server 1 connects to server 2, the network interface board for that segment in server 2 must also use an IPX address of 2.
The other segment in server 2 is a nonconnected segment and requires a unique IPX address. If you assigned this segment an IPX address of 1, you would cause an IPX addressing conflict. This conflict could then lead to NDS synchronization problems for servers that are attached to the segments with the conflicting IPX address.
As mentioned earlier, NetWare servers are not the only network devices that can cause name and address conflicts on an IPX network: Microsoft's File and Print Services for NetWare Networks running on Windows 98 and Windows 95 workstations can also cause address and server name conflicts on an IPX network, as can NWLINK and NWGATE services running on Windows NT servers and workstations. As a result, you should check the IPX addresses and server names for these services as well.
For more information about the -625 error code, you should read the "Troubleshooting 625 Errors Summary" TID (document number 2909017).
-637 Error Code
A -637 error code is also common. This error code indicates a problem with obituaries in the NDS tree. NDS creates an obituary when an object is deleted, renamed, or moved to another NDS context. When such a change is made, it is distributed to the appropriate replicas in the NDS tree. If the change is not successfully made on these replicas, the obituary may become stuck.
For example, an obituary could become stuck if the following occurred:
A WAN link or a server was down during the delete or move process.
A replica of the partition became corrupted and can't communicate with the rest of the tree.
The NDS tree has a design problem. For example, you are maintaining a nonpartitioned tree across a WAN link.
You changed the server name and the internal IPX address in a server's AUTOEXEC.NCF file without first removing the old server information from the NDS tree and manually deleting the old server reference.To remove a server's information from NDS, you must delete NDS from that server while it is connected to the NDS tree. The NDS tree then replicates the removal of the Server object and all referenced objects. After you rename the server and assign it an internal IPX address, you recreate the Server object by reinstalling NDS on the server. The Server object then becomes part of the NDS tree.)
Stuck obituaries are a serious problem. It is critical that you make no changes to the NDS database until you resolve stuck obituaries. (Disregarding this advice can cause corruption that can be fixed only by Novell Technical Services.) It is also critical that you complete specific steps to resolve stuck obituaries. Because there is not enough space in this article to list all of the necessary steps, you should read the following TIDs, which guide you through the troubleshooting process:
"Previous Move in Progress-637/Obituaries" (document number 2923724)
"How to Interpret Obituaries" (document number 2935988)
-659 Error Code
Like a -637 error code, a -659 error code indicates a serious problem--specifically, a Time Not Synchronized fault. This fault occurs when a server cannot perform the synchronization process with another server in the NDS tree because the time information for the two servers does not match.
To resolve a Time Not Synchronized fault, you should ensure that all servers in the NDS tree are maintaining the correct date and time and are operating within the correct time zone. For a list of time zone codes and offsets, you should read the "Time Zones" TID (document number 2939493). You should also keep in mind several points as you set a server to the correct time zone:
The default time zone setting is -x from the Universal Time Coordinate (UTC) (formerly called Greenwich Mean Time). For example, if you entered EST5EDT as the time zone for a particular server, this server would adopt a time zone setting of -5 because Eastern standard time (EST) is five hours behind UTC time.
Because the default time zone setting is -x, you must manually enter the plus (+) sign if you live in an area east of the UTC. For example, if a server were located in Japan, you would need to ensure that this server were set to Japanese standard time (JST), which is nine hours ahead of UTC time. To do so, you would enter JST+9 as the time zone setting.
The letters (such as JST) surrounding the number (such as +9) in the time zone setting are relatively meaningless. These letters serve as a quick reference, allowing you to immediately determine which time zone a particular server is set to.
The next step in resolving a Time Not Synchronized fault is to ensure that you have defined the correct type of time server for all servers in the NDS tree. There are four types of time servers: single time sources, secondary time sources, reference time sources, and primary time sources. In general, if a network includes fewer than 30 servers that reside in the same time zone, you should set up only one server as a single time source and the remaining servers as secondary time sources.
The servers that are set up as secondary time sources get their time from the server that is set up as a single time source. In other words, the server that is set up as a single time source ensures that all servers in the NDS tree maintain the correct time. This server, in turn, must get the correct time from an outside time source (preferably one that is connected to the atomic clock, which is always accurate). There are several ways to configure the server to access an outside time source that is connected to the atomic clock. For example, you could use the RDATE NLM from MurkWorks Inc. (You can download the RDATE NLM free from http://www.murkworks.com/BasePages/products.htm.)
If you have a network with more than 30 servers or if these servers reside in different time zones, you should read the "Time Synchronization Setup" TID (document number 2930686), which defines all types of time servers and explains when you should implement them.
Unfortunately, resolving the Time Not Synchronized fault sometimes causes a secondary problem: One or more servers may report that synthetic time is being used. This problem occurs when a server's time has been moved backward to reflect the correct time after an NDS object in one of the server's replicas has been added, modified, or deleted while the server was using the future date/time.
If the time changes by only a few hours, you can simply let the servers in the NDS tree catch up with this time change. Until all of these servers have received the correct time, however, you should ensure that no NDS operations are performed, thus preventing invalid time stamps.
If the time changes by more than a few hours, the problem is more serious. In this case, you should run the DSREPAIR utility on the server that contains the Master replica of the partition and perform the following steps:
Select the Advanced Options menu.
Select the Replica and Partition Operations option, and then select the root of the current NDS tree from the list of replicas.
Select the Repair Time Stamps option, and declare a new epoch.
Do not take the server time synchronization process lightly: It affects all of the NDS objects in every replica of the NDS database.
For more information about time synchronization, you should read the "Correcting Synthetic Time Errors" TID (document number 2921231).
Although this article covers a limited number of the NDS problems you might encounter, learning how to troubleshoot these problems should give you a good idea of how to troubleshoot other types of problems. In general, the troubleshooting process is basically the same, regardless of the NDS problem you are experiencing: You start by using the DSTRACE function at the server console to isolate the specific NDS error code you have received. You then use Novell's Support Connection web site to search for TIDs that mention this error code. Finally, you use the DSTRACE function or the DSREPAIR utility to resolve the problem.
Some NDS problems are so serious that you may be unable to troubleshoot these problems on your own. If you encounter such a problem, you should not hesitate to open a support incident with Novell Technical Services. No matter how minor the NDS problem is, you should also ensure that no changes are made to the NDS database until this problem is resolved.
Mickey Applebaum has worked with NetWare for more than 14 years. Mickey provides technical support on the Internet for The Forums (http://theforums.com).
NetWare Connection,November 1998, pp. 40-43
* Originally published in Novell Connection Magazine
The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.