Novell is now a part of Micro Focus

Ten Proven Techniques to Increase NDS Performance and Reliability

Articles and Tips: article

RON LEE
Senior Research Engineer
Novell Systems Research

01 Apr 1996


This AppNote discusses ten techniques to improve the design and administration of a NetWare Directory Services (NDS) tree. Your use of these techniques will increase the performance of tree operations for both users and administrators, and improve the overall reliability of the tree.

RELATED APPNOTES Apr 96 "Universal Guidelines for NDS Tree Design" Apr 96 "Enabling Request-IPX-Checksums to Eliminate NDS Packet Corruption Problems"

Introduction

A NetWare Directory Services (NDS) tree is a comprehensive directory system that is hierarchial, distributed, and replicated. To provide this service, NDS maintains data integrity and consistency of the tree's contents across a distributed platform. Because you are storing intelligence in this distributed platform, performance and reliability are vital to your success. However, some circum-stances can place undue stress on the availability and performance of an NDS tree causing communications and performance to suffer. Novell has identified ten proven techniques that will help you remove these stresses:

  1. Partition the tree along WAN and geographic boundaries.

  2. Provide the tree with sufficient server horsepower.

  3. Never use DSREPAIR or DSMAINT for administrative tasks that can be performed with the NDS administrativeutilities.

  4. Be patient. Monitor major operations to completion and check the statusof the tree before initiating subsequent operations.

  5. Check the status of a partition's servers beforeperforming major partition operations.

  6. Avoid operating a tree with a mix of NetWare 4versions.

  7. Always use the latest NOS patches.

  8. Do not use DS.NLM version 4.63 that was shipped on the original NetWare 4.0 CD.

  9. Plan appropriately when you install remote servers configured at a central location.

  10. Consider enabling request-IPX-checksums to eliminate NDS packet corruption problems.

If you put these techniques into practice, your tree will be more reliable and efficient, and continue to scale as it changes and grows to meet the future needs of your organization.


Note: This AppNote assumes you are familiar with NDS conceptsand terminology. For background information see:

  • Introduction to NetWare Directory Services manual

  • Novell's Guide to NetWare 4.1 Networks, Hughes and Thomas, Novell Press, 1996

  • "What's New in NetWare 4.1," Novell Application Notes, Jan. 1995

  • "Management Procedures for Directory Services," Novell Application Notes, March 1994

  • NetWare 4.0 Special Edition, Novell Application Notes, April 1993

This AppNote supersedes "Planning an NDS Tree" in the January 1995 Novell Application Notes.

Ten Proven Techniques

  1. Partition the tree along WAN and geographic boundaries.

    NetWare 4 servers are not islands--they are constantly exchanging information to keep NDS synchronized. Trees that are distributed over one or more WAN links and geographic sites must deal with the unique overhead created by reduced bandwidth and slower access times between sites. In these scenarios, improperly partitioned trees can lead to a breakdown in communication and synchronization between NDS partitions and replicas. Trees that allow partitions to span WAN links can perform poorly whenever the WAN's available bandwidth or availability is unable to keep up with synchronization traffic. Trees that are partitioned along WAN or geographic boundaries minimize this risk and reduce the costs of keeping an ever-changing tree synchronized (see Figure 1).

    Figure 1: To avoid inefficient communication for NDS synchronization, partition the tree along WAN boundaries.

    If the rate of growth and change within the tree is low and the partition is small, the partition may be able to span a WAN link. However, if the tree sees continuous change or if you anticipate significant growth, this recommendation to not allow partitions to span WAN links should become standard operating procedure.

    If one or more of your remote sites has multiple NetWare servers, you may be able to place all of the replicas for that partition at the remote site and still maintain sufficient fault tolerance. This can significantly reduce the amount of synchronization traffic necessary to maintain consistency in the tree.

    If you design one or more partitions that span WAN or geographic boundaries, keep a close eye on the reliability of the link. If someone else manages the WAN infrastructure and is unwilling to or incapable of guaranteeing adequate reliability and bandwidth, partition your tree to minimize the impact of those slow and unreliable links.

  2. Provide the tree with sufficient server horsepower.

    An NDS tree supports numerous types of activities--authentication of users, replication of partitions, object moves and changes, partition operations, synchronization, and background processes. These processes are primarily CPU-, cache-, and disk-intensive. Without sufficient server horsepower, a tree's performance may wane during both user operations and administrative tasks.

    More importantly, insufficient server horsepower can cause a server's CPU to remain at 100% utilization for long periods of time. Under these circumstances, communication with the server can become inconsistent and the server's partitions and replicas can drop out of the tree's normal synchronization process. By the time the server catches up on its backlog of NDS operations, it's out of sync with the rest of the tree and behind again. It's a vicious cycle that can be avoided by properly designing the servers with the horsepower required by the NDS workload.

    Poor performance can also lead to pilot error. This occurs when an administrator becomes frustrated with the slow performance of an operation and either aborts it or proceeds with additional operations, without knowing the outcome of the current operation. In some cases, this scenario can result in significant backlogs of operations, followed by performance degradation, followed by a breakdown in communications, followed by the potential for synchronization problems with data and references in the tree.

    A single low-end server in a replica ring can also pose problems for other high-end servers in the ring during major partition operations, such as splits and joins. The synchronization process can't progress or complete until all of the servers in the replica ring have completed.

    This "weakest-link" scenario, limited only to major partition operations, constrains the synchronization time of the replica ring to the speed of the slowest server in the ring. Day-to-day operations, such as password changes and other informational adds, moves and changes, are not affected by this "weakest-link" phenomenon.

    More information on appropriate server hardware for specific tree designs and characteristics will be forthcoming in a future AppNote.

  3. Never use DSREPAIR or DSMAINT for administrative tasks that can be performed with the NDS administrative utilities.

    DSREPAIR is a tool for diagnostics and repair when a part of the tree is damaged. It should not be used for administrative tasks. Use NWADMIN or NETADMIN for routine partition and replication operations.

    Trees can be damaged when well-intentioned administrators use DSREPAIR to perform operations that should normally be handled with another utility. For example, DSREPAIR can be used to manually remove a server from a replica list while leaving the replica intact on the server. But suppose an administrator mistakenly tries to use this option to remove the replica from the server. DSREPAIR will proceed to remove the server from the replica ring, leaving the unsynchronized replica on the server. The result is a broken replica ring.


    Caution: Don't use any DSREPAIR switches used by NovellTechnical Support without fully understanding theconsequences. Many switches have undocumented sideeffects that require Novell's direct involvement.

    DSMAINT is another very powerful utility that should only be used to upgrade the hardware of a NetWare server or reinstall a server into the tree during disaster recovery. Like any other type of installation, DSMAINT operations should be planned. No other major NDS operations or server installations should take place until the DSMAINT process is complete. DSMAINT should never be used for server cloning--copying the configuration of a server onto multiple server platforms.

  4. Be patient. Monitor major operations to completion and check the status of the tree before initiating subsequent operations.

    Some NDS operations take time. Even some seemingly trivial operations may take more time than you expect, depending on your tree configuration, network infrastructure, and server hardware. Intermittent availability of your network's resources can also slow the necessary replication and synchronization of changes you make to the tree.

    For instance, if a partition operation isn't performing like you think it should, the appropriate action is to monitor it as follows:

    1. Use RCONSOLE or go to the server console of the server holding the master replica (partition operations are managed by the master).

    2. Enter the following two SET commands to enable the display of synchronization status:

      SET DSTRACE = ONSET DSTRACE = +SYNC
    3. Press <Alt<+<Esc< to toggle to the Directory Services trace screen.

    4. Watch for the message "ALL PROCESSED = Yes" to be displayed for each partition on the server. This message indicates that all replicas in the partition ring are synchronized without any errors.

    Sometimes operations will complete before you can get to the screen to observe the DSTRACE results. On slower systems or during larger operations, you can watch the various steps complete. Always monitor large operations to verify completion.


    Note: Remember to turn off the +SYNC option by typing SETDSTRACE = -SYNC at the server console. Constant use ofthe +SYNC setting should not be incorporated into yourstandard operating procedures unless your server hasample CPU cycles. This option can use a significant amountof CPU bandwidth to interpret the results of NDSsynchronization operations and keep the console updated.However, you can use it for limited periods of time.

  5. Check the status of a partition's servers before performing major partition operations.

    Partition operations such as splits and joins are major NDS operations that involve a lot of changes to be replicated and synchronized across the tree. Performing partition operations while one or more servers are down is not desirable because none of the affected partitions and replicas can synchronize until all of their host servers are operational. Host servers include all servers holding partitions, master replicas, read/write replicas, read-only replicas, and subordinate references for the affected partition or partitions.

    Normal day-to-day operations, such as object additions, moves, and other minor changes, can be performed without concern for the availability of all affected partitions and replicas. This is one of the benefits of having multiple replicas (copies of the partition) distributed across multiple servers.

    Guidelines for determining the status of servers in the tree are given in "Troubleshooting Tips for NetWare Directory Services" in the August 1995 Novell Application Notes.

  6. Avoid operating a tree with a mix of NetWare 4 versions.

    If a tree contains a variety of NetWare 4 versions (4.01, 4.02, and 4.1), the tree may encounter problems that have been fixed in more recent releases of NetWare and NDS. We recommend that you upgrade all NetWare 4 servers to NetWare 4.1 or higher. We also recommend an aggressive (timely) upgrade schedule. This reduces the chances of encountering any problems during the synchronization processes related to the upgrade process.

  7. Always use the latest NOS patches.

    Novell's network operating system (NOS) patches are tested individually and in concert with all other current patches. We recommend that you run the full patch configuration and not waste time trying to decide which patches you need and which ones you don't need.

    If you encounter problems with your tree and call Novell Technical Support, one of the first questions will be whether you are using the latest patches. Most of the patches are the result of many man-months spent troubleshooting a particular problem, isolating the offending code, developing and debugging the patch, and testing it before release. Due to this investment, Novell Engineering will not work with Technical Support on any problem that originates from a system with down-level patches.

    For the latest NOS patches, see NetWire on CompuServe, or Novell's FTP or World Wide Web site. At this writing, Novell uses two self-extracting files with the following naming conventions:


    410ITx.EXE

    Contains the latest patches that have passed field testingand are in the final stages of release.

    410PTx.EXE

    Contains previously released patches.

    The last character of the file names, represented by an "x," is a decimal number that identifies the different releases. For example, at this writing, 410IT6.EXE is the current 410ITx file. The next release will be 410IT7.EXE.

  8. Do not use DS.NLM version 4.63 that was shipped with the original NetWare 4.0 CD.

    The original version of DS.NLM (4.63) handles object name collisions incorrectly and should not be used with current versions of DS.NLM. As of this writing, you shouldn't be using any versions of DS.NLM below 4.89c. This version is a mature and reliable foundation upon which to build your tree. Trees that are completely based on NetWare 4.1 and DS.NLM version 4.89c or above are much more easily and quickly repaired if a problem is encountered.

    For the latest DS.NLM, see NetWire on CompuServe, or Novell's FTP or World Wide Web site. Novell uses a self-extracting file with the following naming convention:


    41NDSx.EXE

    Contains the latest versions of DS.NLMand NDS utilities.

    The last character of the file names, represented by an "x," is a decimal number that identifies the different releases. For example, at this writing, 41NDS7.EXE is the current 41NDSx file. The next release will be 41NDS8.EXE.


    Note: The one exception to this recommendation of runningDS.NLM version 4.89c is found in technique 10 below.

  9. Plan appropriately when you install remote servers configured at a central location.

    Organizations without network technicians at remote sites often configure new remote servers at a central location, then ship them to remote sites. If your organization does this, there are several considerations you need to take into account.

    If the tree that you install the server into will see considerable change while the newly configured server is in transit, we recommend that you remove NDS from the server before shipping it. Using INSTALL.NLM you can remove NDS from the server and reinstall NDS once the server arrives at its destination. This approach eliminates the heavy synchronization traffic that would normally result from an aged server with its old partitions and replicas appearing in the tree over a slow WAN link.

    If the tree that you install the server into will only see minimal change, or if the transit time can be kept to a minimum, leaving NDS installed on the remote servers may be acceptable. If you choose this route, install all of the remote servers in concert so that their partitions and replicas reflect the existence of all the new servers. One common mistake is to install one remote server into the tree with a master or read/write replica, box it up for shipment, and then install another server, with its own replicas, into the tree. In this case, the partition on the second server is unable to synchronize due to the absence of the first server's partitions or replicas.

    If you leave NDS installed on the remote servers during transit, place on each remote server only those replicas that are necessary for that site. Partition operations involving the remote server's partitions and replicas will not complete until the remote server is up and available. Object changes affecting the server will also become backlogged. Finally, the purge time kept by NDS cannot move forward without all of the replicas successfully updated. So you need to keep tree changes and transit time to a minimum to avoid major backlogs of operations that can cause performance degradation, followed by a breakdown in communications, followed by problems with the synchronization of both data and references in the tree.

    If a link to the remote server will be a long time coming, we recommend that you create a separate tree for the remote server. Then merge the remote tree and main organization tree when the link becomes available.

  10. Consider enabling request-IPX-checksums to eliminate NDS packet corruption problems.

    If your tree traverses a routed network or if you have faulty hardware that has corrupted data in the past, the potential for NDS packet data corruption exists. On rare occasions, Novell technicians have repaired trees that are healthy with the exception of one or two object names and time stamps being corrupted. After considerable investigation, we learned that these attributes were corrupted in transit from one server to another as they passed through a router.

    In response, an option has been included in DS.NLM version 4.96 and above to enable a request-IPX-checksums feature in NDS. This feature uses IPX checksumming to calculate the sum of each packet's contents at both ends of the communications link. If the receiving server's packet checksum agrees with the transmitting server's checksum, the packet is considered by the receiver to be consistent and correct. Otherwise the packet is discarded and a retry occurs.

    IPX checksumming is initially disabled because it isn't compatible with the Ethernet 802.3 "raw" frame type. If your network currently uses the 802.3 frame type or if your tree traverses links that are configured for 802.3 frames, enabling NDS checksumming will effectively disable NDS communications over those links. For this and other reasons, Novell recommends that you migrate your servers to the 802.2 frame type wherever possible.

    For more information on Ethernet frame types and issues involved in migrating to 802.2, see "Migrating Ethernet Frame Types from 802.3 Raw to IEEE 802.2" in the September 1993 Novell Application Notes.

    To find our more about enabling and monitoring request-IPX-checksums inside NDS, see "Enabling Request-IPX-Checksums to Eliminate NDS Packet Corruption Problems" in this issue of Novell Application Notes.

Conclusion

These ten techniques are proven to increase the reliability and performance of NDS trees of all sizes. If you can incorporate these techniques into your standard operating procedures, your tree will be more stable, provide improved performance for both users and administrators, and allow for future change and growth.

* Originally published in Novell AppNotes


Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.

© Copyright Micro Focus or one of its affiliates