Using the Directory Services Trace (DSTRACE) Screen

Articles and Tips: article

MICHELLE HENDRY-CAVER
NDS Worldwide Support
Novell Technical Services

KEN NEFF
Principal Technical Writer
Novell Developer Information

01 Feb 1997

The DSTRACE debug screen has been available on NetWare servers for several years now, but few people really know what it's good for. This AppNote gets you up to speed on the basic concepts, then provides a DSTRACE command reference to help you troubleshoot NDS synchronization problems.

Introduction
Basic NDS Synchronization Concepts
Starting and Stopping DSTRACE
Interpreting the DSTRACE Display
DSTRACE Command Reference

Introduction

The Directory Services Trace (DSTRACE) screen is a troubleshooting aid to help debug problems with Novell Directory Services (NDS) in NetWare 4.x and IntranetWare networks. Originally used by the NDS engineers to help in the development process, DSTRACE has since been made available to all network technicians and system administrators for diagnosing NDS synchronization errors. Because of its roots as an engineering tool, the DSTRACE screen can display a lot of obscure information that may be difficult to interpret.

This AppNote aims to clear up some of the mystery behind what you see in the DSTRACE screen. It first presents a quick overview of basic NDS synchronization concepts, and then provides some DSTRACE usage information. It also provides a reference of DSTRACE commands, filters, and processes.

This AppNote assumes you are familiar with the fundamentals of Novell Directory Services, especially partitioning and replication. If you need more information about these and other aspects of NDS, refer to the following sources:

Novell Concepts manual and other Directory Services documentation
Previous AppNotes on Novell Directory Services
Novell's Guide to NetWare 4.1 Networks by Jeffrey F. Hughes and Blair W. Thomas, Novell Press (ISBN #1-56884-736-X; to order, call 800-762-2974 or 317-596-5200)

This information is based on Technical Information Documents (TIDs) 2908733, 2909019, and 2909026. Novell TIDs can be found on the NetWire forum on CompuServe, on the Novell Support Connection CD-ROM (formerly NSEPro), and on Novell's Support Connection web site at http://support.novell.com. For more detailed information about specific errors you might see in DSTRACE, do a search of the Novell Support Connection web site with "DSTRACE" as a keyword.

Basic NDS Synchronization Concepts

In NetWare 4.x and IntranetWare networks, servers are installed into a logical structure called the Directory or NDS tree. The NDS tree appears to users as a single database of objects representing network resources (servers, printers, users, and so forth). But this database can be physically distributed across the network to optimize network traffic, provide fault tolerance, and place resources where they are needed. Behind the scenes, partitioning and replication are used to customize NDS and optimize communication between servers in the tree.

To maintain data consistency in this distributed database, servers exchange update information on a regular basis. This is known as synchronization, and it occurs between all servers that hold a replica of a given NDS partition. NDS is a "loosely consistent" database, meaning that changes or updates take time to be propagated throughout the network, but eventually all copies of a partition will receive the updates and be in synch again.

The set of servers that have a copy of the same partition is known as a replica ring or replica list. You can view a server's replica ring by using the DSREPAIR utility. At the server console, type "LOAD DSREPAIR <Enter<" and select Advanced Options from the menu. Then select Replica and Partition Operations, choose a replica from the list, and select View Replica Ring Information. This will show the other servers in the replica ring for that partition.

There are two kinds of replica synchronization: fast (high convergence) and slow. Most object modifications are scheduled for fast synchronization, which automatically begins 10 seconds after a client update event occurs on the server. Slow synchronization is used for modifications to more static database entries such as login time, login network address updates, bindery, property attributes, extended schema, and backlinks. By default, slow synchronization is scheduled to occur 30 minutes after the triggering event.

Skulking

Skulking is another NDS synchronization process that works hand-in-hand with regular replica synchronization. Whereas replica synchronization is used to propagate changes shortly after they occur, the skulking operation makes sure all replicas have the same information even if no changes have been made recently. The skulker is also known as the "heartbeat" process because it periodically checks every server in a replica ring to make sure they are all still "alive" and in synch.

The skulker process checks the Sync Up To status of every server that has a replica of a given partition. Then, if necessary, the synchronization process goes through the replica list and synchronizes changes to all replicas, one replica at a time. After successfully sending all updates to one replica, a server sends updates to the next server, and so on, until all the replicas have received the updates. If any replicas are not updated in one round of heartbeat synchronization, the server calls the skulker to schedule synchronization again.

By default, the skulker process is scheduled to begin every 30 minutes. However, it can be scheduled for a sooner time (via the SET DSTRACE=*S command) or it can be forced to begin synchronization immediately (via the SET DSTRACE=*Hcommand).

Starting and Stopping DSTRACE

DSTRACE is often referred to as a utility, but it is actually a group of SET commands that you can run at the server console. When you type the appropriate SET command at a server=s console, it activates a separate Directory Services screen that shows you an inside view of NDS activity. In particular, this screen shows you the synchronization processes of the replicas on a particular server, including its synchronization with the other servers in the replica ring. You can use DSTRACE to monitor the status of NDS synchronization processes and to view any errors that might occur while these processes are running.

Starting DSTRACE

To start the Directory Services trace screen, type the following at the console prompt (these commands are not case-sensitive):

SET DSTRACE=ON

The server will display the following message to let you know DSTRACE is active:

DSTrace is set to: ON

So you'll have something to see, type a DSTRACE process command such as:

SET DSTRACE=*H

This command forces an immediate "heartbeat" synchronization to occur.

To see the actual trace screen, switch to the screen titled "Directory Services" (if you are at the server console, type <Alt<+<Esc< to toggle between screens; if you are using RCONSOLE, press <Alt<+F3). You should see information such as the following being displayed (this simple example is from a single-server NDS tree):

(97/01/05 16:12:38)

SYNC: Start sync of partition <[Root]< state:[0] type:[0]<
SYNC: End sync of partition <[Root]< All processed = YES

The first line is the time and date when the master replica was last synchronized. The second line shows the synchronization process starting and indicates which partition is about to be synchronized ([Root] in this example). State:[0] means that the server's partition operation state is ON, and Type: [0] means this is the master replica of the partition. (More details on the various states and types are given later in this AppNote.) The third line shows the end of the synchronization process, with the message "All processed=YES" indicating that the process completed successfully on this server.

Logging DSTRACE Output to a File

On a multiserver network, the NDS activity can quickly fill up the trace screen and the information you are looking for may scroll up before you have a chance to read it. You can save the DSTRACE screen output to the DSTRACE.DBGfile located in the SYS:SYSTEM directory. DSTRACE.DBG is an ASCII text file that can be viewed later with the NetWare INSTALL utility's Edit option or with a text editor. Saving the trace file is useful if the DSTRACE screen scrolls by too fast or if you need to send a trace to Novell Technical Support.

Note: You can specify a different path and filename for the DSTRACE file by using the SERVMAN utility. Type LOAD SERVMAN and select the Server Parameters option. Then select Directory Services, followed by NDS Trace Filename.

To start logging the DSTRACE information to the DSTRACE.DBG file, type the following commands:

SET DSTRACE=ON(Turns the trace screen on) SET TTF=ON(Causes output to be recorded to the DSTRACE.DBG file) SET DSTRACE=*R(Resets the trace file, overwrites old information)

Now either wait for a synchronization cycle to complete, or cause something to happen by typing commands such as:

SET DSTRACE=+S(Displays objects that are synchronizing) SET DSTRACE=*H(Forces "heartbeat" synchronization)

After the information you wanted to log has been displayed on the screen, type the following command to stop logging the DSTRACE output to a file:

SET TTF=OFF

If you don't turn off the Trace To File (TTF) function, the DSTRACE.DBG file can become very large and could fill up your SYS volume in a few days.

Stopping DSTRACE

DSTRACE generates a small amount of overhead on the server, especially when filters such as the +sync option are enabled. Do not turn DSTRACE on and leave it running indefinitely. It should only be enabled for a short time to troubleshoot problems with NDS synchronization and to diagnose NDS errors. (You can also use DSTRACE for a quick check of synchronization status before performing partition operations such as splits and joins.)

To deactivate the DSTRACE screen, type the following command:

SET DSTRACE=OFF

Interpreting the DSTRACE Display

In general, the DSTRACE screen shows you four main things about each partition that exists on the server where you run the command:

The partition name
The state and type of the partition
The NDS process currently taking place
Whether or not the process completed successfully

As an example, look at the following DSTRACE screen:

  
  SYNC: Start sync of partition<[Root]> state:[0] type:[1]
  31355319:577:FB010000  SYNC: Start outbound sync with (2) 
  [B5000922]<SCHONE_OSFLR.SERVERS_OSFLR.Novell>
    SENDING TO ------> CN=SCHONE_OSFLR
  31355319:604:FB010000  SYNC: sending updates to server <CN=SCHONE_OSFLR>
  31355319:763:FB010000  SYNC: update to server <CN=SCHONE_OSFLR> 
  successfully completed
  31355319:777:FB010000	 SYNC: Start outbound sync with (3) 
  [010000B8]<4SFT3_OSLAB.SERVERS_LAB.Novell>
    SENDING TO ------> CN=4SFT3_OSLAB
  31355319:798:FB010000	 SYNC: sending updates to server <CN=4SFT3_OSLAB>
  31355319:910:FB010000	 SYNC: update to server <CN=4SFT3_OSLAB> 
  successfully completed
  3135531A:002:FB010000	 SYNC: SkulkPartition for <[Root]> succeeded
  3135531A:004:FB010000	 SYNC: End sync of partition <[Root]> All 
  processed = YES.

The key information in this screen is that synchronization occurred for the [Root] partition which is ON (state 0). This server has a read/write (type 1) replica of this partition. At the end of the display, we see that the synchronization process completed successfully (All processed = YES).

All Processed = YES / NO

The "All processed = YES" message is one of the main things you want to see when running DSTRACE. It means that, from this server's point of view, the synchronization seemed to be complete. "All processed = NO" means that there was an error or incomplete synchronization between this server and at least one of the other servers in the replica ring. Here is an example of a DSTRACE display containing errors (the lines are numbered for ease of reference):

2   SYNC: Start outbound sync with (#=3, state=0, type=3) [01000ad2]<DS3.Root<<
  3   (16:12:38) SYNC: failed to communicate with server <CN=DS3< ERROR: -625<
  4   SYNC: Start outbound sync with (#=2, state=0, type=1) [01000ad3]<DS2.Root<<
  5   SENDING TO ------> CN=DS2

  6   SYNC: sending updates to server <CN=DS2<<
  7   SYNC: [01000ad2][(14:42:09,1,1] DS3.Root (NCP Server)

  8   SYNC: Objects: 1, total changes: 3, sent to server <CN=DS2<<
  9   SYNC: update to server <CN=DS2< successfully completed<
 10   SYNC: Skulkpartition for <Root< Succeeded<
 11   SYNC End sync of partition <Root< All processed = NO

Line 1 shows the start of the partition synchronization process.
Line 2 shows the first outbound synch information for the partition being sent to server DS3.
Line 3 shows a synch error; in this case, a "failed to communicate" error (-625).
Line 4 shows the second outbound synch for the partition being sent to server DS2.
Lines 5 and 6 show the transferring of information to server DS2.
Line 7 shows the syncronization of the NCP Server object.
Line 8 is a summary of information sent to server DS2.
Line 9 shows that the update to server DS2 has been successfully completed. There were no errors for this server.
Line 10 indicates that the skulker synchronization of the Root partition has finished.
Line 11 shows the end of the partition synch process. The message "All processed = NO" is displayed because the partition was not updated due to the -625 error on DS3.

The ultimate goal is to clean up all 600-series errors so that you see "All processed = YES" displayed for all partitions on the server you are currently observing.

Note: Just because one server shows "All processed = YES",don't assume that the partition is okay. It is only one server'spoint of view. If you are having a problem with a partition, youshould get the point of view from each server involved in theproblem partition. If every server shows "All processed = YES" it means the entire partition is okay.

Color Coded Information

Starting with NetWare 4.1, DSTRACE displays key information and errors in color to help them stand out from the other information. For example, partition names are displayed in blue, while the "All processed=YES" message is in yellow and green. Not all problems show up as color coded, but in most cases the colors do help you distinguish the important from the non-important.

Understanding Replica States

Another key piece of information in the DSTRACE display is the replica state. When a partition operation occurs, such as adding or removing a replica, creating a new partition, or merging an existing child partition back with its parent, all of the servers in that partition's replica ring are involved. During these operations, replicas pass through various transition states.

Note: The state of a replica is server-centric;in other words, the state of the same replica can be differenton different servers. For example, Server1 might show Server2having a state of ON in Server1's replica ring, while Server2'sreplica ring shows the same replica on this replica ring to bein Transition State On.

State [0]: On This is the final stage of all partition operations. The state of "On" is what you want to see in the DSTRACE screen for each replica. It means all entries have been copied for the partition.

State [1]: New Replica This is the state a replica is put into when it begins the operation of adding itself to the replica list. The server receiving the new replica establishes communication with the server holding the master replica. The new replica is assigned a replica timestamp and is set to a state of New Replica. The NDS synchronization process begins copying the partition contents to this server, while updating the replica lists of the other servers in the replica ring.

Note: During this step, the server accepts information only from the master replica; clients cannot access this server's new replica information until the process is complete.

After the synchronization process has updated this server with the needed information, the server holding the master replica propagates a state change to Transition State On for this server's replica. The other servers in the replica ring are notified and set to a Transition On state as well. The server with the new replica must contact all other servers in the ring to be able to change to Transition State On. If a server is not contacted, that server will stay in a New Replica state.

Note: Before the replicas are changedto Transition State On, any needed subordinate reference replicasare added to the servers and turned ON. Subordinate references are needed and should never be deleted from thereplica ring.

State [2]: Dying Replica When you remove a replica, there are two possible scenarios: (1) removing a replica from a server holding the parent partition; and (2) removing a replica from a server that does not hold the parent.

Removing a Replica with the Parent Present. In this first scenario, a request is made to the server holding the master replica to remove a replica. The state is changed to Dying Replica and the server propagates the changes. It first checks to see if there is a replica of the parent partition present on the server. If there is, the master changes the type to a subordinate reference and propagates the change out to all the replicas. Once all the updates are done, the state changes to ON. It then converts the objects to external references and backlinks them to the real objects.

Removing a Replica without a Parent Present. Removing a replica without a parent present follows the same procedures as with a parent. The only difference is that, when the master checks to see if the parent is present and finds it is not, it deletes the replica and propagates the change to all the partition's replicas. It still creates an external reference and backlinks it to the real entries.

State [3]: Locked Certain partition operations need replicas to be locked so that only one change on that partition occurs at one time (such as in the Move State 0). In this case, the replica state is changed to Locked.

State [4]: Change Replica Type State 0 There are two scenarios when changing a replica's type: one is changing a replica to be a non-master replica, and the other is to change its type to be the master replica.

Non-Master Replica Type Change. Changing a replica to be a non-master is the simplest case. A change request is given and the server holding the master replica makes the change and propagates it to the other servers in the replica ring.

Master Replica Type Change. Changing a replica to a master involves making a request to the server holding the master replica. The master replica sets that replica as a Change Replica Type State 0 and propagates that change to the other servers in the replica ring. Then the current master server tells the new server that it is now the master. The new master waits until its clock is ahead of the timestamps issued by the old master, and then the new master places new timestamps on all replicas for that partition. The new master then makes the state change to Change Replica Type State 1.

State [5]: Change Replica Type State 1 (Continuation of Master Replica Type Change, above) Once the old master switches into Change Replica Type State 1, it appoints itself as a secondary (read/write) replica. This change is propagated to all the partition's replicas. Once all the changes are received and updated, the replica state changes once again to ON, which is also propagated through the partition's replicas. Once ON, the old master is officially a read/write replica and the target replica is now the master.

State [6]: Transition State On This state occurs after the New Replica State. During this state, the replica that was added requests any newer information than what the master replica had. Once the replica has communicated with every server in the ring and is updated with their new information, it requests to be turned ON via the server holding the master replica. The server holding the master replica is responsible to send the state ON to the other replicas when all are updated and synchronized.

State [48]: Split State 0 This state occurs when a new partition is created, or split from an existing partition. A request to create a new partition is made to the server holding the master replica of the partition that is to be split. The master sets its state to Split State 0, and then communicates with the rest of the servers in the replica ring to inform them of the request. The master then sets all the partition's replica to Split State 1.

State [49]: Split State 1 Once in Split State 1, the server holding the master replica of the parent partition makes sure that all updates and synchronization are complete. If they are, the Master turns ON the parent partition and the new partition. This change is propagated to the other replicas.

State [64]: Join State 0 This is the state when two partitions are combined (joined) back into one. A request is sent to the server holding the master of the child partition. The child master locates the parent's master replica and sends a request to be joined to the parent. The parent master sets the state to Join State 0 and propagates it to the partition's replicas. The child master does the same for its partition's replicas. Both servers then check to see if any servers need each other's replica information. Servers that do not have both the child and parent partition get a copy of the replica. This sets the replica state to Join State 1.

State [65]: Join State 1 Once the process of combining partitions has begun, the next step is to erase the boundary between the partitions. This is what occurs in Join State 1. All object entries from both partitions are put into one partition. The new replica that was created is now in Join State 2.

State [66]: Join State 2 Join State 2 propagates the changes of the combined partition. The replica state then turns to ON. Once ON, the parent master is the master replica of the new partition, and the child master becomes a read/write replica.

State [80]: Move State 0 NDS allows any subtree to be moved as long as it has no child partitions and has consistent schema rules. The client sends a move request to the destination server (the server holding the master replica of the parent partition under which the child is moving). Rights are checked and the move begins. There is a time expectation set in the server's memory. If the expected time does not expire, the move continues until finished; otherwise time expires and the move is aborted. If time does not expire, the finished move request is sent to the source server (the server holding the master replica of the partition being moved).

The source server identifies the destination server and sends it the object entries in the partition. The destination server checks to make sure all the servers in the destination=s backlinks will support the move. If all is okay, the destination server initiates the move and sets the partition's replicas state to Move State 0. The source server does the same with its replicas. The propagation begins and the changes are sent to all the replicas. The source server sends a request to lock the source server's replica (it being the master replica). The changes are then made and the external references and backlinks are created accordingly. Once all entries have moved and all is synchronized successfully, the replica is unlocked and turned ON.

Replica Types

The type value in the DSTRACE screen identifies the replica type on this server. There are four different kinds of replicas:

Type [0]: Master In a master replica, clients can create, modify and delete entries and perform operations that deal with partitions.

Type [1]: Read/Write In a read/write replica, clients can create, modify and delete entries.

Type [2]: Read Only In a read only replica, clients cannot make changes. They can only read information.

Type [3]: Subordinate Reference Subordinate reference replicas provide the connectivity necessary for "tree walking" within the NDS tree. A subordinate reference is automatically created whenever a replica of the parent partition exists on a server, but no replica of the child exists on that server. The subordinate reference points to the child reference. Again, subordinate references should not be deleted from a server.

600 and 700 Series Errors

When problems are encountered during a partition operation or synchronization process, DSTRACE displays various NDS errors in the 600 and 700 series. One that you might see often is a -625 error, as in the following line:

(16:12:38) SYNC: failed to communicate with server <CN=DS3< ERROR: -625

This particular error indicates that there was a problem communicating with server DS3, possibly because the server was down or the WAN link to it was unavailable. The causes of other errors, along with suggested solutions, are described in the Directory Services Errors section of the DSDOCx.EXE document (where x is 2 or greater). This Envoy document can be downloaded from the Web at http://support.novell.com.

"Normal" Errors

In NDS, not all errors are necessarily bad; some are considered to be "normal" errors. A normal error is caused by something that is inevitably going to happen in a distributed database such as NDS. One example is errors that appear along with DSACommonRequest messages. These help make process decisions so the login process, changes in the Directory, and similar events are handled correctly.

Here is an example of a "normal" error:

31355116:254:FB00F000 DSA REQUEST BUFFER:

  31355116:255:FB00F000  00 00 00 00 62 20 00 00 00 
                                    00 00 00 2E 00 00 00 ....b...........
  31355116:262:FB00F000  53 00 45 00 52 00 56 00 45 
                                    00 52 00 2E 00 53 00 S.E.R.V.E.R...S.
  31355116:268:FB00F000  45 00 52 00 56 00 49 00 43 
                                    00 45 00 53 00 2E 00 E.R.V.I.C.E.S...
  31355116:275:FB00F000  4E 00 4F 00 56 00 45 00 4C 
                                    00 4C 00 00 00 00 00 N.O.V.E.L.L.....
  31355116:282:FB00F000  01 00 00 00 00 00 00 00 01 
                                    00 00 00 00 00 00 00 ................
  31355116:293:FB00F000 DSA REPLY BUFFER:
  31355116:294:FB00F000  00 00 00 00 10 00 00 00 
                                                         ........
  31355116:299:FB00F000 DSA: DSACommonRequest(1): returning ERROR -601
  31355117:070:FB00F000 DSA: DSACommonRequest(1) conn:8 
                                    for client<Admin.Novell<

In this example, the user Admin.Novell tried to log in using the context SERVER.SERVICES.NOVELL, which does not exist in this tree. (Note that this context name is displayed in the buffer.) As a result, the ERROR -601 occurs. This is considered a "normal" error because it occurs along with the DSACommonRequest message.

Another example of a "normal" error is a collision. Collisions occur when duplicate updates to the same object arrive at the server. They are displayed in the DSTRACE screen as in the following example:

310509EE:597:FB032000 COLLISION: Ignoring duplicate request to overwrite value.

  310509EE:600:FB032000 [01000155] [00037B40] V=(97/01/23 10:12:00, 0008, 0006)

      AVA=(97/01/23 10:12:00, 0008, 0006)

  310509EE:605:FB032000 COLLISION: Ignoring duplicate request to overwrite value.

  310509EE:608:FB032000 [01000155] [002CD600] V=(97/01/23 10:12:00, 0008, 0009)

      AVA=(97/01/23 10:12:00, 0008, 0009)

  310509EE:613:FB032000 COLLISION: Ignoring duplicate request to overwrite value.

  310509EE:616:FB032000 [01000155] [003CDD40] V=(97/01/23 10:12:00, 0008, 000C)

      AVA=(97/01/23 10:12:00, 0008, 000C)

A collision can occur when an update is made while the WAN link is down between two servers, but a neighboring server sends the changes to the target server through a different link. When the first link comes back up, the changes are sent again to the server. However, since the server already has received the changes, it ignores the duplicate request.

Obituaries

NDS uses a special attribute called an obituary to track objects that are to be deleted, moved, or renamed. Since there are typically multiple instances of each object stored on various servers in the tree, NDS flags each instance of the object via the obituary attribute so that no further changes can be made until the delete, move, or rename has been propagated to all servers holding a replica of the object's partition.

When an object is flagged for deletion, it goes through several stages before actually being removed from the tree. In a loosely consistent database such as NDS, servers do not receive all updates simultaneously. Therefore servers may not all hold the same object information at a given time. For this reason, each server holds on to the old information until all other servers have received the updates. The usual replica synchronization process is used to synchronize changes to the obituary attribute. In the case of an object deletion, after the skulking process completes and each instance of the object is flagged accordingly, the object is purged by the janitor. Backlinks are also notified that the object is going away.

Note: The janitor process will not run unless the partition is in synch ("All processed = YES" appears on each server in the replica ring). If a server's backlink does not get notified that the object is gone and anotherobject is created with the same name, a rename can occur resultingin an object named <number<<number<.

Here is an example of a DSTRACE screen showing a normal obituary occurring for an object that is being deleted:

313553C9:688:FB010000  SYNC: Start outbound sync with (2) 
[010000B8]<OSLAB.SERVERS_LAB.Novell<<
  SENDING TO ------>CN=OSLAB
313553CA:040:FB010000 SYNC: sending updates to server <CN=OSLAB<<
313553CA:061:FB010000 SYNC:[0100014C][(10:19:10),1,31] Admin.Novell (User)
313553CA:165:FB010000 SYNC: [08000291] obituary for treial.SERVERS_LAB.Novell
313553CA:168:FB010000 valueTime=313553BF,1,1 type=1, flags=0,
oldCTS=313553BB,1,1
313553CA:170:FB010000      valueTime=313553BF,1,2 type=6, flags=0,
oldCTS=313553BB,1,1
313553CA:173:FB010000      valueTime=313553BF,1,3 type=6,
 flags=0, oldCTS=313553BB,1,1
313553CA:212:FB010000     SYNC: Objects: 1, total changes: 
11, sent to server <CN=OSLAB<<
313553CA:214:FB010000    SYNC: update to server <CN=OSLAB< 
successfully completed<

If you see the same obituary over and over again but the object never gets purged, there may be a synchronization problem. Remember that the janitor process is in charge of cleaning up object deletions and it will not run until all servers in the replica ring display "All processed = YES".

Unknown Objects

Occasionally you might see references to "unknown" objects in a DSTRACE screen. Unknown objects can appear in NDS when a new replica is being added to a server, when a restore is in progress, or even when a server is being installing. The unknown objects created as a result of these activities are temporary. Once the process is completed, the unknown objects are updated with all the information they need and become real objects.

Unknown objects that appear during replica synchronization can cause problems and prevent the synchronization process from completing. In most cases, all you have to do is allow sufficient time and the unknown objects will go away by themselves. If you have not recently performed any of the above-mentioned activities and you have an unknown object that persists in the tree, you can generally just delete it (depending on the object).

Note: If an unknown object represents something that is easily replaceable, such as a print queue, you can just delete it and recreate the original object. If the unknown object is a server object or something else of that importance,call Novell Technical Support for assistance.

DSTRACE has a number of commands and filters that you can use to manipulate the display to show you more or less information about NDS activity. It also has commands that initiate certain synchronization processes, and others that allow you to change certain NDS parameters on the server. This section provides a quick reference to the various DSTRACE options, grouped under the following categories:

Basic functions ( SET DSTRACE=. . . )
Filters ( SET DSTRACE= +or B . . . )
Processes ( SET DSTRACE= *. . . )
Parameters ( SET DSTRACE= !. . . )

Although the DSTRACE commands are shown in all caps for clarity, the commands are not case-sensitive. All commands must be typed at the server console prompt.

Note: Not all DSTRACE options are included in this AppNote. The ones discussed here are considered"safe" for general use, while the ones omitted are reserved for experiencedtechnicians who have been trained in their proper use.

Basic DSTRACE Functions

Use these commands for basic functions to control the DSTRACE screen.

SYNTAX

SET DSTRACE=<COMMAND<

Example: SET DSTRACE=ON

COMMAND	Description
ON	Enables the separate DSTRACE debug screen. The minimum debug level is turned on. (You should enable DSTRACE only when troubleshooting or monitoring a process. While DSTRACE is enabled, it affects the server's CPU utilization slightly.)
OFF	Disables the DSTRACE debug screen, but does not reset the filters.
ALL	Enables all debug trace message filters. This will allow even Anormal@ errors such as DSACommonRequest errors and collisions. Buffers are not turned on with this command.
DEBUG	Turns on a predefined set of general debugging messages by enabling the following filters: ON, INIT, FRAGGER, MISC, STREAMS, LIMBER, JANITOR, BACKLINK, SKULKER, SCHEMA, INSPECTOR, ERRORS, PART, EMU, VCLIENT, RECMAN and REPAIR.
AGENT	Turns on a predefined set of DS Agent-related debugging messages by the enabling the following filters: ON, JANITOR, BACKLINK, RESNAME, DSAGENT, and VCLIENT.
NODEBUG	Enables the DSTRACE screen but turns off all debugging messages. The screen will display just the title and version at the top. This command is useful when you want to start over with the filter commands. Filter commands are cumulative and are normally reset only when DS.NLM is unloaded and reloaded. The NODEBUG option, used in conjunction with the +MIN filter, resets the DSTRACE screen to the defaults without having to unload and reload DS.NLM.
CHECKSUM	(Available with DS.NLM version 4.89 and above) Enables Transport Dependent Checksumming (TDC). Useful in networks with routers that divide files into 64 KB packets and then rebuild the files. The process of making sure the packet does not get corrupted during the exchange or rebuild process is done through NetWare's IPX data packet checksumming, which this command enables. Packet checksumming actually checks the integrity of the data packets. (IPX checksumming is not supported by the Ethernet 802.3 Araw@ frame type. You must be using the 802.2 frame type.) If this TDC checksumming option is enabled on one server, it is recommended that it be enabled on all servers. This command is permanent; even unloading DS.NLM and reloading it will not turn the checksumming off. The only way to turn it off is to issue the NOCHECKSUM command and down the server, as described below.
NOCHECKSUM	Disables transport level checksum. The only way to completely remove checksumming is to set NOCHECKSUM and down the server. This will clear all connections that may still be involved in checksumming. If checksumming is turned off on one server, it should be disabled on all servers.

CRC Checking

DS.NLM version 5.01 and higher has a new feature called CRC checking. CRC checking is Transport Independent Checking (TIC), whereas the CHECKSUM option in the earlier DS.NLM versions is Transport Dependent Checking.

There is no command to turn this checksumming on or off. Because CRC checking guarantees data integrity, it is always enabled in DS.NLM version 5.01 or higher.

With Transport Independent Checking, DS.NLM takes the data stream file and does a check on the data, putting flags accordingly on the message packet (Data + CRC). The files are then divided up into 64 KB packets by the fragger and sent across the network. Once the transmission is completed, the fragger rebuilds the packets. DS.NLM then makes sure the data was rebuilt correctly by completing the CRC Checking process.

Note: The CHECKSUM described above can also be applied to packets, if allowed by the frame type.This is not part of the DS.NLM CRC checking procedure. However,after upgrading to DS.NLM 5.01 or higher, it is recommended thatyou turn off the transport dependent CHECKSUM option.

DSTRACE Filters

A healthy partition should show State: [0] and "All processed = Yes" for all servers in the replica ring. If errors are appearing in the DSTRACE screen, you can use use filters to change the screen's view of the synchronization process, thus providing additional information to see where the errors are occuring and to help identify what might be causing the problem.

Filters allow DSTRACE to display more or less of certain aspects of the NDS processes. Filters themselves do not perform any processes. They simply allow you to see a different view of the processes being performed. Filters are turned on by using a plus ( + ) sign, and off by using a minus ( - ) sign.

SYNTAX

SET DSTRACE = +<FILTER<SET DSTRACE = - <FILTER<

Examples: SET DSTRACE=+SYNC SET DSTRACE=-SYNC

FILTER	Description	When to Use
AUTHEN	Enables debug error messages of authentication events such as the login process via the workstation or server.	When you are having authentication problems, -669, -699, or router errors.
BACKLINKBLINK	Enables debug error messages pertaining to the backlinking process (connecting external references to real objects). Note: The backlinker process resolves external references to ensure they refer to real entities. It also ensures that real objects connect to placeholders of real objects (external reference).	After removing all replicas from a server, or when using the *B process to see if backlinking was successful.
BUFFERS	Enables debug error messages in the request and reply buffers used by the DS Agent (DSA).	To ensure that a request or reply was sent for an operation such as creating a print queue and so on.Rarely used.
COLLISIONCOLL	Enables debug error messages when duplicate changes are attempted on the same object, causing a collision. (Collisions are non-critical or Anormal@ errors in NDS.) Note: A collision occurs when dual changes have been made on the same object. One change takes effect and the other one is discarded to prevent duplicate updates from being applied.	To see whether collisions are occurring.Rarely used.
DSAGENTDSADSWIRE	Enables debug error messages of the low-level DS Agent (DSA) tracking. This will show DSACommonRequest errors, which are non-critical or "normal" errors in NDS. Note: The syntax of the error is DSACommonRequest verb. The verb is associated with a number of different types of activities, such as verifying a password. If a user logs in with the wrong password, you will see a DSACommonRequest error 601.	In conjunction with BUFFERS, to show the request and reply made for a certain operation and whether it was successful.Rarely used.
EMU	Enables debug error messages in the bindery partition. Note: Setting a bindery context on a server creates a "dynamic" bindery. In a dynamic bindery, objects and SAPs are created when opened, but deleted when closed.	To check a bindery SAP (for example, a print server SAP) or to find what is causing a -632 error.
ERRORSERRE	Enables all debug error messages. Problems on any process will be displayed, along with any "normal" errors.	To display all errors occurring in NDS.Rarely used.
FRAGGERFRAG	Enables debug error messages at the fragger level. Note: Fragmented packet handling is when NDS/NCP packets are divided up into 64 KB packets and rebuilt after being sent across the wire.	To check fragmentation of large packets.Rarely used.
IN	Enables debug error messages of inbound synchronization traffic (what is being received by this server). Note: In the DSTrace screen, an asterisk ( * ) indicates information that is being received.	To verify whether errors are coming from another server's synchronization.
INIT	Enables debug error messages relating to the initialization of NDS. Note: To see anything, you must send the output to a DSTRACE.DBG file (using the SET TTF=ON and OFF command).	When NDS will not open, or when an error occurs while initializing NDS.
INSPECTORI	Enables debug error messages of the inspector process. Note: The inspector prepares for the janitor process. It inspects the database to check objects= integrity, to see if anything appears to be broken based on the schema and NDS expectations. This is a server-based inspection. Checks to see if DSRepair should be run.	When a lot of changes have been initiated, but are not being propagated around the NDS tree.
JANITORJ	Enables debug error messages of the janitor or clean-up process. Note: The janitor process checks connectivity to all servers in database, and sets the UP and DOWN status of the NCP entries. The purger process then runs and modifies timestamps. The flatcleaner calls the janitor after the message "All processed=Yes" is displayed.	In conjunction with *J, to monitor the clean-up process for errors.
LIMBER	Enables debug error messages of the limber process (a server connectivity check). Note: The limber process verifies the server name, internal IPX address, and tree connectivity of all replicas.	In conjunction with *L, after changing the server name or address to watch for errors.
MERGE	Enables debug error messages of objects being merged. Note: A merge occurs when two objects combine to form one; for example, when a subordinate replica is changed into a read/write replica. In this case, the zero timestamp and a real timestamp merge together.	To monitor the merge activity for errors.
MIN	Enables debug error messages at the minimum debugging level. Note: To use this correctly, typeSET DSTRACE=NODEBUG first, then SET DSTRACE=+MIN. This has the same effect as unloading DS.NLM and reloading it to restore DSTRACE to its defaults.	Used in conjunction with NODEBUG to reset DSTRACE to its defaults without unloading and reloading DS.NLM.
MISC	Enables debug error messages of all miscellaneous processes. Note: An example of a miscellaneous process is the bagging operation. The bagging process flags an object ready to be overwritten. If you change a subordinate reference to a read/write replica, for instance, it will mark or prepare the pointers to be overwritten by the real object. External reference objects are then overwritten by the real objects.	To watching for errors in the activity of changing a subordinate to a read/write replica, and so on.
OUT	Enables debug error messages of outbound synchronization traffic (what is being sent out from this server).	To verify whether errors are coming from outbound server synchronization
PART	Enables debug error messages of partition operations.	Activity of joining or creating a partition, and so on.
RECMAN	Enables debug error messages of access to the record or database manager (low- level NDS database processes).	When you begin, end, or abort all database functions.Rarely used.
REPAIR	Enables debug error messages of the repair process. Note: The repair process clears the servers in the replica list and rebuilds the list with those servers that are found. This determines which servers DSREPAIR will call on for communication.	To monitor the repair process for errors.Rarely used.
RESNAMERN	Enables debug error messages of resolve name requests. Note: The resolve name process maps the Entry Name to an Entry ID on a particular server.	To monitor the activity of logging in, walking a tree, or mapping to a volume object.
SAP	Enables debug error messages of the Service Advertising Protocol (SAP). Note: Messages are displayed when NDS uses SAP to broadcast tree names. NDS listens on tree name SAPs and returns information when they are broadcast.	To see SAP-related errors such as: "Could not send advertising packet: IPX number" or -632 errors.
SCHEMA	Enables debug error messages of schema modifications and schema synchronization. Note: Schema modification should take effect on all servers involved in the tree.	When the schema has been extended and to see the schema synchronize.
SKULKERSYNCS	Enables debug error messages of the synchronization traffic. Displays object updates involved in the synchronization process. Note: Skulking is the background synchronization process. It checks the synchronization status of every server in the replica ring.	To see more detail on all 600 series errors. It will show you what object is getting the error. If the object is not an important one such as a server, you can usually just delete and recreate it. Most used!
STREAMS	Enables debug error messages about streams. Note: Streams provide a way to make an NDS attribute out of a file on the server. It is used for such stream attributes as login scripts.	To troubleshoot login script and print job configuration problems. Rarely used.
TIMEVECTORTV	Enables debug error messages relating to local and remote Sync Up To vectors. Note: Time vectors have local and remote Sync Up To vectors which contain timestamps.	In conjunction with the +SYNC filter, to see the last Sync Up To time.
VCLIENTVC	Enables debug error messages dealing with server-to-server connections and outgoing client messages.	In conjunction with *U, to see the communication activity between servers.

DSTRACE Processes

When troubleshooting errors, it is often useful to force a synchronization process to occur while you are watching rather than waiting for the regularly scheduled synchronization. Various NDS processes can be forced with the following DSTRACE commands that include an asterisk ( * ).

SYNTAX

SET DSTRACE = * <PROCESS<

Example: SET DSTRACE=*H

PROCESS	Effect
*.	Unloads and reloads DS.NLM in the SYS:SYSTEM directory. Useful when you are updating a version of DS.NLM because it can be done without disrupting users. This process renames the DS.NLM in memory to DS.OLD and then loads the new DS.NLM found in SYS:SYSTEM. Both old and new versions are loaded for a short period of time.
*B	Forces the backlink process to begin. When NDS creates an external reference for an entry not stored on the local server, NDS attempts to create a backlink or pointer to the real entry. This process occurs every 780 minutes by default. It can cause a significant amount of network traffic, so is best done during non-peak times.
*C	Displays connection table statistics for outbound connection caching or virtual clients (available only with DS.NLM 4.97 and above).
*CI	Displays connection table statistics for virtual clients, including idle time information (available only with DS.NLM 4.97 and above).
*CR	Displays connection table statistics for virtual clients, including information about routing table packets (available only with DS.NLM 4.97 and above).
*CT	Displays connection table statistics for virtual clients, including which servers this server is connected to (available only with DS.NLM 4.97 and above).
*C0 (zero)	Resets the display of connection table statistics for virtual client; same effect as unloading and reloading DS.NLM (available only with DS.NLM 4.97 and above).
*E	Checks entry cache. Locks the NDS database, verifies that the entry cache is viable, then re-opens the database.
*F	Forces the flatcleaner and janitor process to begin. These processes purge and remove deleted, unnecessary, or expired items and entries. They also purge dynamic bindery objects and "not present" external references.
*G	Gives Sync Up To for the server and changes the server status to DOWN. When too many requests are in process and the server ID is unable to be specified, this process gives up on the server and flags it as down to stop other servers from trying to communicate with it.
*H	Forces the heartbeat, or skulker process to begin. This initiates immediate communication between servers to exchange timestamps with all servers in the replica ring.
*L	Forces the limber process or server connectivity check to begin. This process checks the server=s object to make sure that it is in the right tree: correct tree name, server name, and IPX address.
*M	Sets a maximum file size for the DSTRACE.DBG file. Used in conjunction with the SET TTF= command. The size must be specified in hexadecimal while SET TTF=OFF. The range is 10,000 to 10,000,000 bytes.
*P	Displays the current settings on this server for the tunable NDS parameters, similar to the following: TUNEABLE PARAMETER VALUES ServerStateUpThreshold = 30 minutes External Reference Life Span = 192 hours* JanitorInterval = 2 minutes FlatCleaningInterval = 60 minutes BacklinkInterval = 780 minutes Heartbeat Data = 30 minutes Heartbeat Schema = 240 minutes Requests In Progress threshold = 1000 Request IPX checksums = DISABLED IPX:RIPDelay = 20 ticks IPX:Retries = 3 IPX:TimeOutScaleFactor = 2 IPX:TimeOutShiftFactor = 4 Disk accesses before yield = 10 Connection Expiration Timeout = 135 minutes NDS Packet CRC checking = ENABLED Maximum Sockets Threshold = 75% Outbound Synchronization = ENABLED Inbound Synchronization = ENABLED Schema Outbound Synchronization = ENABLED Schema Inbound Synchronization = ENABLED *This parameter determines how long external references can stay unused before they are deleted. It is set with the SET NDS External Reference Life Span = n command, with a default value of 192 hours and a range from 1 to 384 hours (16 days). Some of these parameters can be changed via SERVMAN or through other SET commands, as explained in the ADSTRACE SET Parameters@ section below.
*R	Resets the DSTRACE.DBG file to zero bytes. Used with the SET TTF= command.
*S	Schedules the skulker process. This is similar to the *H process, but instead of starting the heartbeat immediately, this process checks first to see if any of the replicas on the server need to be synchronized. If so, it schedules the synchronization process to run sooner than usual.
*SS	Starts an immediate synchronization of all schemas. Any time the NDS schema is modified by changing or creating new attribute definitions and/or new object class definitions, these changes will be replicated among all the servers in the tree.
*U	Forces the server status to UP and resets the server=s communication status list. If no server ID is specified, it sets the status of all servers in the replica ring to UP. (With DS.NLM 4.94 and higher, this command is server-centric.) This provides the same function as the SET NDS SERVER STATUS=UP command.

DSTRACE SET Parameters

The last category of DSTRACE commands are the SET parameters. It is not recommended that you change these settings unless you have a specific reason to do so, as they affect synchronization and other critical NDS processes. Most of the current parameter settings can be seen using the SET DSTRACE=*P command. (Settings so indicated can also be changed using SERVMAN or other NDS SET commands.)

SYNTAX

SET DSTRACE= ! <SET PARAMETER< <VALUE<

Example: SET DSTRACE=!X10

PARAMETER	Description	Default	Range
!Bn	Changes the time interval (in minutes) between backlink consistency checks. *P Display: BacklinkInterval = 780 minutes SERVMAN Option: backlink intervalSET Command: SET NDS Backlink Interval = n	780	1 to 10,080 (7 days)
!Cn	Changes the maximum sockets threshold, which is the percentage of sockets to use on the server before they are recycled; works with the maximum number of sockets set on the server; a resource error is generated if not enough sockets are available. *P Display: Maximum Sockets Threshold = 75%	75	25 to 100 (percent)
!CEn	Changes the connection expiration timeout value (in minutes). *P Display: Connection Expiration Timeout = 135 minutes	135(2 hrs 15 min)	10 to 1440(24 hrs)
!Dn	Disables both inbound and outbound synchronization for the specified number of hours. *P Display: Outbound Synchronization = DISABLEDInbound Synchronization = DISABLED	24	Up to 10,080(7 days)
!DIn	Disables inbound synchronization for the specified number of hours.	24	Up to 10,080(7 days)
!DOn	Disables outbound synchronization for the specified number of hours.	24	Up to 10,080(7 days)
!E	Enables both inbound and outbound synchronization. *P Display: Outbound Synchronization = ENABLEDInbound Synchronization = ENABLED	Enabled	N/A
!EI	Enables inbound synchronization.	Enabled	N/A
!EO	Enables outbound synchronization.	Enabled	N/A
!Fn	Changes the interval (in minutes) at which the flatcleaner process automatically begins purging and deleting entries from the database. *P Display: FlatCleaningInterval = 60 minutes	60	1 to 10,080(7 days)
!Gn	Changes the amount of time (in ticks) to wait before giving up when outstanding requests (Requests In Process) are not answered. (A tick is 1/18 of a second or 32 microseconds.) *P Display: Requests In Process threshold = 1000	N/A	0 to 200,000
!Hn	Changes the "heartbeat" synchronization interval (in minutes). *P Display: Heartbeat Data = 30 minutes SERVMAN Option: NDS inactivity synchronization interval	30	2 to 1440(24 hours)
!In	Changes the heartbeat base schema synchronization interval (in minutes). *P Display: Heartbeat Schema = 240 minutes	240	2 to 1440(24 hours)
!Jn	Changes the interval (in minutes) at which the NDS janitor process executes. *P Display: JanitorInterval = 2 minutes SERVMAN Option: janitor interval	2	1 to 10,080(7 days)
!Rn	Changes the maximum number of times the server's disk can be accessed before it yields. *P Display: Disk accesses before yield = 10	10	1 to 10,000
!Sn	Enables or disables schema synchronization. *P Display: Schema Outbound Synchronization = ENABLED Schema Inbound Synchronization = ENABLED	1	0 = OFF1 = ON
!SI	Enables schema inbound synchronization.	Enabled	N/A
!SO	Enables schema outbound synchronization.	Enabled	N/A
!Tn	Changes the Server State Up threshold, which is the time interval (in minutes) at which NDS checks the server state before returning -625 errors. *P Display: ServerStateUpThreshold = 30 minutes	30	1 to 720(12 hours)
!V	Lists any restricted versions of DS.NLM that exist on the network.	N/A	N/A
!Wn	Changes the IPX Request In Process (RIP) delay, which is the length of time to wait after getting an IPX timeout before resending. *P Display: IPX:RIPDelay = 20 ticks	20	1 to 2000
!Xn	Changes the IPX Retry count for the DS (server-to-server) client. This determines the number of IPX retries before an NDS -625 error is displayed. *P Display: IPX:Retries = 3	3 retries	1B50 retries
!Yn	Changes the IPX Timeout Scale factor, a number used for the estimated trip delay in the equation: IPX Timeout = (T * Y) + Z where T=Ticks to get to the destination server, and Z is the additional delay specified by the !Z parameter. Note: The maximum possible IPX Timeout value is 540, regardless of the settings for Y and Z. *P Display: IPX:TimeOutScaleFactor = 2	2	0 to 530
!Zn	Changes the IPX Timeout Shift factor, a number which adds additional delay for the IPX timeout in the equation in the !Y description above. To increase the timeout, change this parameter first. *P Display: IPX:TimeOutShiftFactor = 4	4	0 to 500

* Originally published in Novell AppNotes

Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.