Novell is now a part of Micro Focus

Troubleshooting Synchronization with NDS Manager

Articles and Tips: article

DAVE DOERING
Senior Analyst
TechVoice, Inc.

01 Aug 1998


Learn how to use NDS Manager to troubleshoot NDS synchronization problems.

Introduction

An effective network management strategy must include careful monitoring of the NDS tree. In particular, the process of synchronization of the replicas is at the heart of the this process. If the replicas begin to differ, the ability of NDS to properly function is compromised.

In the past, monitoring the status of the synchronization process required console access to view either DSRepair or DSTrace (or both) to keep tabs on NDS. If the administrator wished to perform these tasks from a workstation, an RCONSOLE session could be run. However, in a pure IP environment, such as in NetWare 5.0, such an approach won't work. (RCONSOLE uses IPX/SPX for its connection.) Also, the information provided by DSRepair was not in an easy-to-digest form and lacked any way to direct the administrator's attention to error conditions.

In comes NDS Manager as a key tool for monitoring and troubleshooting synchronization on the network. This AppNote explains the different ways to use NDS Manager in monitoring NDS. It also provides troubleshooting steps for resolving common problems. (An earlier AppNote provided a detailed overview of the functions and features of the utility. See "Using NDS Manager's Graphical Schema Manager Tool in NetWare 4.11," in the June 1998 issue of Novell AppNotes.)

Documenting the Replica Rings

Before troubleshooting can begin with NDS errors, there is a key piece of information needed for understanding the tree. This is the tree's partition and replica rings. Knowing the location of each master replica, read/write replica, and each subordinate reference is an important first step in successfully resolving replica ring inconsistencies.(See the appendix on DSDiag for another approach to documenting replica rings.)

For example, suppose your tree includes six sites—a home office and five remote sites—with a set of at least two servers at each site. A possible partition or replica matrix might look like this:


Site
Server
[Root]
ABC
NTH
Home
SITE1
SITE2
SITE3
SITE4
SITE5

Home

S1

 

 

 

 

M

M

M

M

M

 

S2

R/W

R/W

R/W

R/W

SR

SR

SR

SR

SR

 

S3

 

 

 

 

R/W

SR

SR

SR

SR

SITE1

S1

R/W

SR

R/W

SR

R/W

 

 

 

 

 

S2

 

 

 

 

R/W

 

 

 

 

SITE2

S1

R/W

SR

R/W

 

 

R/W

 

 

 

 

S2

 

 

 

 

 

R/W

 

 

 

SITE3

S1

R/W

SR

R/W

SR

 

 

R/W

 

 

 

S2

 

 

 

 

 

 

R/W

 

 

SITE4

S1

R/W

SR

R/W

SR

 

 

 

R/W

 

 

S2

 

 

 

 

 

 

 

R/W

 

SITE5

S1

R/W

SR

R/W

SR

 

 

 

 

R/W

 

S2

 

 

 

 

 

 

 

 

R/W

Creating this type of matrix showing the location and type of each replica would be difficult using a DSRepair-type of tool. NDS Manager presents this information in a graphical format quite close in appearance to this chart (see Figure 1). By transferring this information to report form, it can serve as a ready reference when communicating with technical support or in tracing errors. It will also serve as a method for recreating partitions and replicas when further sites are added with their servers. Too often it is difficult to remember how a particular partition was set up months after the fact, especially if the partition and its replicas have functioned normally for those months.

Figure 1: Displaying replica information in NDS Manager will show the structure immediately and allow for the creation of new ones along the same lines as the ones already created.

Subordinate References

Subordinate references (SRs) are another item in the tree that NDS creates automatically on servers with parent partitions. For that reason, they can proliferate in greater numbers than an administrator may be aware of. Since SRs are used by NDS in replica synchronization, they multiply the amount of network traffic required for each synchronization. Server response time is degraded, most notably on the Master replica-bearing server. (This is compounded for any NDS replicas lying across a WAN link.) Reducing the number of SRs is a wise approach to effective NDS management.

By displaying the location of each subordinate reference, as shown in Figure 2, NDS Manager can serve both to locate and remove unnecessary references. It may also suggest changes in the partition structure of the tree to reduce the number of references.

Figure 2: The location of subordinate references.

Why is the Replica Ring Important?

In addition to documentation, the contents of the replica ring are important for monitoring your network. It is also important for NDS to perform two key functions: removing and adding replicas, and receiving objects.

Note: Sending objects does not require the same level of information because it is a top-down operation rather than a flow from bottom to top.

Resolving an Inaccurate Replica Ring

Here is one way to use the replica ring for troubleshooting. Suppose that server FARSIDE holds inaccurate information in its replica ring for one of its replicas. This causes an error condition during synchronization. To resolve this, FARSIDE requires updated information in the form of a new copy of the affected replica ring. It can obtain this from another server which has an accurate replica for the required partition.

NDS offers three ways to update the replica ring:

  1. Remove and recreate the inaccurate replica.

  2. Have FARSIDE receive all objects.

  3. Have the server with the Master replica send FARSIDE the update.

NDS Manager can perform each of these operations.

Preventing NDS Error Conditions

Novell recommends that you use several preventative measures to ensure continued directory services health. These include running NDS Manager and performing these checks:

  • Repair Network Addresses

  • Verify Remote Server IDs

  • Repair Local Database

These operations keep the cached addresses and IDs current.

Note: Perform the Repair Local Database after-hours or when the database won't be actively in use since this operation locks the database.

Performing these checks can also narrow down the search in troubleshooting transient NDS errors that are difficult to pinpoint.

Understanding NDS Errors

The causes of NDS errors come in two varieties: those which are transitory and those which are recurring. Note that we differentiate between the cause of the error and the error itself (reflected in its error code). A single, recurring problem may provoke one or more error codes, which in themselves may be recurring or simply transitory.

A challenge to an administrator is in differentiating between these two. Recurring errors will require intervention, while transitory errors may or may not. Recurring errors result from some change in a network condition that is permanent. The error will continue to occur until the administrator rectifies the problem. (Naturally, some error conditions are not the result of any problem with NDS but rather with network conditions themselves.)

For example, a recurring NDS error will result from attempts to synchronize with a server which has been permanently removed from the network. NDS will continue to issue an error until the appropriate Server object is removed from the NDS tree.

A transitory NDS error will occur when NDS attempts to synchronize with a server which has only temporarily been removed from the network, or which has lost its network connection. When the server is restored, synchronization will resume without intervention from the administrator.

Troubleshooting NDS synchronization may include looking outside NDS. In our scenario, a communications error over a WAN circuit has prevented synchronization. This condition is outside the control of NDS and cannot be resolved by NDS. However, once the communications problem is resolved, the NDS error will no longer occur.

Synchronization and communications are integral to NDS. NDS is an object-oriented, global, distributed, partitioned, and replicated hierarchical database. As such it is essential that it remain in almost constant communication with other servers in the same NDS tree. Disruptions in this communication will result in NDS errors. These errors can include:


Code
Description

- 622

ERR_INVALID_TRANSPORT

- 625

ERR_TRANSPORT_FAILURE

- 626

ERR_ALL_REFERRALS_FAILED

- 632

ERR_SYSTEM_FAILURE

- 636

ERR_UNREACHABLE_SERVER

- 684

ERR_SECURE_NCP_VIOLATION

- 715

ERR_CHECKSUM_FAILURE

Note: These are only some of the error codes that may result from a communications failure. Also, our discussion here of how to resolve these errors are by no means comprehensive and therefore represent just one of several avenues of troubleshooting.

NDS Manager is a useful tool in noting and resolving these communications and synchronization errors. Communications errors occur from bad hardware (failed network card), problems with LAN drivers (outdated, faulty, or incompatible drivers); or an unreliable network connection (cable located next to power). WAN problems can result from provider errors (such as occurred in AT&T's Frame Relay network in May of 1998).

Troubleshooting Network Addresses

An incorrect network address for a server also can provoke these communications errors. This address should be validated against the actual network address information used by other servers in the NDS tree in order to eliminate it as a cause of the error. NDS Manager can repair an incorrect network address in the NDS database on a particular server. However, if the affected replica is the Master replica, it may be necessary to elevate another replica to Master before repairing it.

Note: Repairing network addresses does not lock the NDS database.

Repairing your tree's network addresses ensures that the servers in your network are broadcasting correct addresses. NDS Manager will check the address of every server in the local database by searching for the server's name in the local SAP tables. If NDS Manager finds an address in SAP, it compares this address to the NetWare Server object's IPX network address property and the address in each replica property of every partition root object. If the address it finds differs from these, this operation updates it. If NDS Manager cannot find a SAP name-to-address mapping, it cannot make a repair for that server.

Note: You may need to run CONFIG on each server console in the ring in order to accurately document each server's internal IPX address.

Remote Server IDs

Another NDS error condition that NDS Manager provides a tool for is verifying remote server IDs. Incorrect IDs can provoke an abend (abnormal end) in NetWare 4.11, or it can leave the Transition On status in an ongoing loop without resolving the transition. If NetWare 4.11 has incorrect remote server IDs for its replica ring, the server can abend if the IDs become corrupted or invalid. NDS will only resolve the Transition On status when the server has completed contacting and receiving updates from all the other servers in the replica ring. Both these conditions are indicative of a communications problem on the network. NDS Manager in its Partition Continuity dialog provides tools for repairing the network addresses or to verify and repair the remote server IDs.

Note: Both repairs can also be performed from the server console by typing LOAD DSREPAIR -RN -RIat one of the servers in the affected replica ring.

The -626 Error

If NDS Manager shows a -626 error ERR_ALL_REFERRALS_FAILED on a replica, it may mean that there is a child partition without a master. Use NDS Manager to view the replica ring to verify if this is so. If the replica ring is consistent on all servers, you will need to assign a new master in NDS Manager. (Right-click on an R/W replica and choose Change Type.) In a worst case scenario, where there are only subordinate references available in the ring, you'll need to load DSREPAIR with the -a parameter to begin the repair.

Only convert a subordinate reference into a master if there are no other real replicas (RO or RW) available. All information for the child partition is lost because there is no object information in a subordinate reference. You'll need to restore this information from backups.

The -632 Error

If NDS reports a -632 error condition (ERR_SYSTEM_FAILURE) it is not the result of a complete system failure. Rather, it is an indicator that NDS found an unexpected error on the source server. This may be due to a problem with the server's NDS database either internal to it or as the result of received unexpected data from the database. One possible cause of the -632 error results from incorrect remote server IDs, as mentioned in the last paragraph. Again, you can use NDS Manager's Partition Continuity dialog to verify and repair those IDs on the affected replica ring.

Updating a Replica

Suppose that a communication problem has caused a server's replica to become inaccurate. This can be done by retrieving the information from another server in the same ring which holds an accurate copy of the replica. NDS Manager offers several means to accomplish this.

Removing and Restoring the Replica. One way to correct this is to remove and then restore a replica to that server. To do this, follow these steps.

  1. Run NDS Manager and highlight the affected server with its partition, as shown in Figure 3. Figure 3: This screen lets you remove and then restore a replica to a server.

  2. Right-click on the replica to display the replica options menu shown in Figure 4. Figure 4: The Replica Options menu in NDS Manager.

  3. Choose Delete. NDS Manager asks you if you are sure you want to delete this replica. Answer Yes to remove it.

  4. Highlight the Master replica in the NDS Manager main screen.

  5. Right-click on the Master replica to display the Add Replica window shown in Figure 5. Figure 5: The Add Replica window in NDS Manager.

  6. In the Add Replica window, you can choose the type of replica you want (Read/Write or Read-Only) and on which server you want to place the replica in the tree. Choose the affected server and choose OK.

Note: A drawback to repairing a replica in this way is that it requires the replica ring to include the server with the Master replica. If the copy of that replica ring does not have the Master replica listed, this remove/restore operation will not work.

Receiving Updates. This operation allows you to make a real-time update of this replica. To do this, follow these steps:

  1. Highlight the partition you need to correct in NDS Manager's main screen.

  2. Right-click on the partition and select Receive Updates from the menu. NDS will then perform an immediate synchronization on this replica, as shown in Figure 6. Figure 6: The Receive Updates window in NDS Manager.

Like the remove/restore operation, this update operation also requires that the Master replica be in the server's replica ring or it will not work. Sending Updates. This operation allows you to immediately send a real-time update to the Master replica in the ring from the affected server. The drawback to this operation is the redundancy of information sent to all the servers in the replica ring which are functioning normally.

Note: NDS does perform a check on each of the receiving servers to verify if they have or have not received the information in the update. If the receiving server has, it discards the update.

If your affected server has a Master replica in its ring, choose Send Updates. (Even though this may burden the network with additional traffic, it has the greatest chance of fixing the replica.) If your affected server does not have a Master replica in its ring, but does have Read/Write replicas in the ring, again it is best to Send Updates. If your affected server does not have a Master replica or Read/Write replicas in the ring, it is best to create a Master replica (or change a Read/Write replica to Master) in NDS Manager.

Conclusion

NDS Manager offers extensive control over the partitions and replicas in a tree—working right from the Windows desktop alongside NWAdmin. Unlike server-based tools, it provides a clearer graphical picture of NDS health than the text-based versions. In typical day-to-day operations, keeping NDS Manager up and running on the desktop allows quick access to the status of remote site replicas and their synchronization. This is invaluable to networks providing services over WAN links.

Appendix: DSDiag, A New Diagnostic and Report Tool for NDS

NetWare's NDS Manager includes a variety of functions for real-time management of partitions and replicas. It does not, however, provide much in the way of reporting—either as a way of documenting the system or as a way of pinpointing issues relating to potential or current error conditions. Novell will introduce DSDiag with NetWare 5.0 to provide just these kinds of services. (An earlier version made available online supported NetWare 4.x only. The new version is backward-compatible with 4.10 and 4.11.) DSDiag is server-based, but provides a client base approach on the server. It therefore supports multiple trees and identities from the single interface. This means that administrators can view and report on more than just the tree to which the server is part of and can obtain information from different administrators' perspectives—not just the administrator who logged in at a workstation as with NDS Manager.

DSDiag also can run multiple reports in realtime without waiting for one report to finish. This is advantageous when there are many partitions and replicas in the tree. (For example, reporting on the Novell_Inc tree can take from 20 minutes to two hours, depending on the traffic or on the type of information requested.) It can also provide a contrasting view of similar information taken from different sources. (For example, compare and contrast a report of what a server says its replica ring is and a report of what the NDS tree says the server has.)

Unlike NDS Manager, DSDiag is a C-Worthy, text-based utility (see Figure 7) rather than a ConsoleOne Java tool. Figure 7: DSDiag has a C-Worthy menu-based interface.

DSDiag can also run a report on the tree to display the NDS versions of each server. It can also modify this report to display only those servers with versions greater, less than, or equal to a given value. This allows for a quick diagnosis of NDS errors stemming from a single server having an older or newer version of DS.NLM.

Finally, like NDS Manager, DSDiag can also report on replica rings, check the partition's status (such as when the partition was last completely synchronized or when did it last attempt to synchronize), and compare two replica rings to see if all the replicas have the same view of the replica ring.

* Originally published in Novell AppNotes


Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.

© Copyright Micro Focus or one of its affiliates