Overview of NDS Scale

Articles and Tips: article

NANCY CROSSEN
Technical Writer
Internet Infrastructure Division

JOHN WILLIAMS
Technical Writer
Internet Infrastructure Division

SELBY HERRIN
Technical Writer
Internet Infrastructure Division

01 Apr 1997

Discusses Novell's NDS Scale product. NDS SCALE product provides the features necessary to distribute NDS across multiple trees.

Introduction
Partitioning
Partition Operations
Replication
Synchronization
Distributed Relationship Management
DNS NDS Personality
LDAP

Introduction

The NDS SCALE product provides the features necessary to distribute NDS across multiple trees. These features include:

Partitioning
Replication
Synchronization
Distributed Relationship Management
DNS NDS personality
LDAP NDS personality

Partitioning

The SCALE version of NDS divides the Directory tree into logical subtrees called partitions. Although any part of the Directory can be considered a subtree, a partition forms a distinct unit of data for storing and replicating Directory information. Partition boundaries cannot overlap, so each entry in the Directory appears in only one partition.

A partition subordinate to another partition is called a child partition, while its immediate superior is called a parent partition. Partitions must obey the following rules:

They must contain a connected subtree.
They must contain only one container object as the root of the subtree.
They cannot overlap with any other partition.
They take their name from the root-most container object (the container at the root of the subtree), which is called the partition root.

Figure 1 shows a partitioned tree with Engineering.Novell as the parent and Dev.Engineering.Novell as a child partition. Novell is the Directory tree's root partition.

Figure 1: A partitioned tree.

Partition Operations

NDS allows administrators to create and manage partitions and their replicas. These operations, called partition operations, allow great flexibility in maintaining and modifying the Directory tree. Partition operations include the following:

Adding a replica of a partition. This operation involves placing a replica of a given partition on a specific server.
Changing a replica's type. This operation changes a replica's type, including creating a new master replica. For example, an administrator may want to change a read/write replica to a read-only replica to restrict changes to that partition's data.
Removing a replica from a set of replicas. This operation removes one or more replicas of a given partition.
Splitting a partition. This operation creates a new partition from a container in an existing partition.
Joining two partitions. This operation joins a parent and child partition, making one partition from the two.
Moving a partition. This allows administrators to move an entire partition and its contents to another part of the Directory tree without affecting connectivity or access control privileges.

All these operations involve two major stages: the initial operation involving the client and the master replica, and a second stage during which the partition changes are sent to each replica of the partition.

Replication

A single instance of a partition is called a replica. Partitions can have multiple replicas, but only one replica of a particular partition can exist on each server. (Servers can hold more than one replica, as long as each replica is of a different partition.) One of the replicas (usually the first created) of a given partition must be designated the master replica. Each partition can have only one master replica; the other replicas are designated as either read/write or read-only replicas. (You can use the read-only replica only to read the information in the partition replica. You cannot write to a read-only replica.)

Replication adds fault tolerance to the database because the database has more than one copy of its information.

Replica Types

Replicas must be designated as one of four types:

Master
Read/write
Read-only
Subordinate reference

One (and only one) replica of a partition must be designated as the master replica; the other replicas must be designated as either a read/write or read-only replica, or a subordinate reference. The replicas are invisible to the end user; that is, the user does not know which replica contains the entries being accessed.

Master, Read/Write, and Read-Only Replicas

Clients can create, modify, and delete entries on either master or read/write replicas. However, clients can perform operations that deal with partitions only on the master replica. Clients cannot make any changes to read-only replicas.

Figure 2 shows three partitions (A, B, and C) replicated across three name servers (NS1, NS2, and NS3).

Figure 2:Partitioning and Replication.

NS1 stores the master replicas of partitions A and B and a read-only replica of partition C.
S2 stores the master replica of partition C and read/write replicas of A and B.
NS3 stores secondary replicas of A and C.

Given this arrangement, any of the servers could handle a request to add an entry to partition A. Only NS1 and NS2 could handle a similar request for partition B, and only NS2 and NS3 could handle such a request for partition C.

Only NS1 can create a new partition that is subordinate to partition A or B, and only NS2 can create a new partition that is subordinate to partition C.

Subordinate References

Subordinate references, which are not visible to users, provide tree connectivity. Each subordinate reference is a complete copy of a given partition's root object but is not a copy of the whole partition. As a general rule, subordinate references are placed on servers that contain a replica of a parent partition but not the relevant child partitions. In other words, a subordinate reference points to an absent subordinate partition. In this case, the server contains a subordinate reference for each child partition it does not store.

Subordinate references provide tree connectivity by referring to replicas the server may need to find. Because the subordinate reference is a copy of a partition root object, it holds the Replica attribute, which lists all the servers on which replicas of the child partition can be found. NDS can then use this list to locate replicas of the subordinate partition.

Figure 3 shows a partitioned tree and its subsequent replica placement on the servers that are holding the tree.

Figure 3: Replica placement in a partitioned tree.

Some of the servers in Figure 3 hold replicas of parent but not replicas of the corresponding child. These servers must also hold subordinate references to the child partitions they do not hold, as shown in Figure 4. For example, because server Srv4 holds a replica of the Eng partition but not of Test, it must hold a subordinate reference to Test.

Figure 4: Replica placement and subordinate references.

On server Srv1 in Figure 4, the subordinate reference of Mktg.ABC is a complete copy of the root object, Mktg.ABC, but not of its subordinate objects; the subordinate reference of Test.Eng.ABC is a complete copy of the entire partition, since Test.Eng.ABC is the only object in the partition. Users cannot change a subordinate reference's replica type.

Besides providing tree connectivity, subordinate references also help determine rights. Because a subordinate reference holds a copy of the partition root object, it holds that object's Inherited ACL attribute, which summarizes the Access Control Lists up to that point in the tree.

Replica List

Each replica contains a list of servers that support the partition it represents. The replica list is stored in each replica as a Replica attribute of the partition's root-most container entry. This list provides information needed for navigating the NDS tree and synchronizing the replicas. The replica list contains the following elements for each replica:

Server Name. The name of the server where the replica is located.
Replica Type. The type of the replica stored on the server designated in the Server Name field. (The type is either Master, R/W, RO, or SR.)
Replica State. The status of the replica. (The statuses include On, New, Replica dying, among others.)
Replica Number. The number that the master assigned to this replica at the time the replica was created.
Network Address. The server's address.
Remote ID. The Entry ID of the replica's partition root entry.

Synchronization

Synchronization is the process of ensuring that all changes to a particular partition are made to every replica of that partition. The X.500 standard defines two synchronization mechanisms: master/ slave synchronization and peer-to-peer synchronization.

The master/slave mechanism requires that all changes be made on the master replica. That replica is then responsible to update all the other replicas (slave replicas).

In a peer-to-peer synchronization system, updates can be made to any read-write or master replica. At a predetermined interval, all servers holding copies of the same partition communicate with each other to determine who holds the latest information for each object. The servers update their replicas with the latest information for each replica. In NetWare, the synchronization time interval ranges from between 10 seconds to 30 minutes depending upon the type of information updated.

NDS uses both the master/slave and peer-to-peer synchronization processes, depending upon the type of change being made. The master/slave mechanism synchronizes operations such as partition operations that require a single point of control. The peer-to-peer mechanism synchronizes all other system changes. Most operations use peer-to-peer synchronization.

Loose Consistency

Because the NDS database must synchronize replicas, not all replicas hold the latest changes at any given time. This concept is referred to as loose consistency (called transient consistency in the X.500 standard),which simply means that the partition replicas are not instantaneously updated. In other words, as long as the database is being updated, the network Directory is not guaranteed to be completely synchronized at any instance in time. However, during periods in which the database is not updated, it will completely synchronize.

Loose consistency has the advantage of allowing Directory servers to be connected to the network with different types of media. For example, you could connect one portion of your company's network to another by using a satellite link. Data traveling over a satellite link experiences transmission delays, so any update to the database on one side of the satellite link is delayed in reaching the database on the other side of the satellite link. However, because the database is loosely consistent, these transmission delays do not interfere with the normal operation of the network. The new information arrives over the satellite link and is propagated through the network at the next synchronization interval.

Another advantage to loose consistency is that if part of the network is down, the changes will synchronize to available servers. When the problem is resolved, the replicas on the affected servers will receive updates.

Replica Synchronization Process

Because an NDS partition can be replicated and distributed across a network, any changes made to one replica must be sent to, or synchronized with, the other replicas. The synchronization process keeps data consistent across the network.

Before you can understand how NDS synchronizes replicas, you must understand some of the components necessary for synchronization.

Time Stamps. One critical component in synchronization is the time stamp, which records information about when and where a given value in a given attribute was modified. When NDS updates a replica, it sends a modification time stamp with the data to be updated. The replica compares time stamps and replaces the old information with the new. A replica is considered synchronized when it has received the latest updates from all other replicas of its partition.

Partition Information. For normal operations, including synchronization, to be successful, the partition root object on each server must store several important attributes and their values:

Replica Pointer(s)
Partition Control attribute
SynchronizedUpTo vector
Synchronization cache
Object Control attribute

In the source code synchronization is known as Skulking. The purpose of the synchronization operation is to check the synchronization status of every server that has a replica of a given partition. Factors that determine whether synchronization is necessary include the replica's convergence attribute, its replica type, and the time that has elapsed since the replica was last synchronized or updated. The system scans the partition records locally to decide which partitions need to be synchronized.

NDS provides a trigger, or heartbeat, every thirty minutes to schedule synchronization. The network administrator can adjust the trigger'stime interval with the use of the DSTrace console SET command.

The synchronization process involves updating all replicas with all the changes made to a partition since the last synchronization cycle. The synchronization process takes the replica list and synchronizes the replicas one at a time to the replica that has changed.

Since NDS is a loosely synchronized database, an update made at one replica propagates to other replicas of the partition over time. Any modification to the NDS database activates the replica synchronization process. When a change is made locally to an NDS entry on one server, the synchronization process wakes up to propagate the change to other replicas of the partition. There is a ten-second hold-down time to allow several updates to be propagated in one update session. Replica synchronization proceeds one replica at a time throughout the replica ring of a partition.

After a server successfully sends all pending updates to one replica, it goes on to the next replica until all replicas have been updated. If the operation fails for one or more replicas and they are not updated in one round of the synchronization process, it reschedules them for a later synchronization cycle.

Distributed Relationship Management

Distributed relationship management consists of three components that help keep NDS trees connected:

External references
Back links
Obituaries

External References

A server usually stores replicas of only some of an NDS Directory's partitions. Sometimes a server must hold information about entries in partitions that the server does not store. Often, the server requires information about an entry in a parent partition. Having this information helps the local server maintain connectivity with the upper part of the tree. At other times, the server requires information about entries in partitions that are not parents or children of partitions it stores; for example, the file system may need to refer to these entries, or an entry stored on the local server may need to refer to them.

NDS stores these types of information in external references, which are placeholders containing information about entries that the server does not hold. External references are not "real" entries because they do not contain complete entry information.

Besides providing connectivity, external references improve system performance by caching frequently accessed information. Currently, NDS caches only an entry's public key. The Modify Entry routine is used to store the public key as an attribute on the external reference.

Creating External References. NDS creates external references when an entry is not stored on the local server:

Authenticates and attaches to the server.
Is added as a trustee to a locally stored file system or entry.
Becomes a member of a locally stored group.

In addition, NDS creates external references when a replica is removed from the server. NDS calls changes all of the entries in the removed replica into external references and marks them as expired.

Keep in mind the following two rules about creating external references:

NDS never creates an external reference below a real entry in the tree.
NDS never creates a subordinate reference below an external reference in the tree. Any subordinate references below an external reference will be removed during synchronization.

Deleting External References. On each server, NDS deletes expired external references if they have not been used within a specified time period. The system administrator can use a SET parameter to set a number of days after which NDS deletes external references that have not used, are not needed for another entry's context, or do not contain information that the operating system needs.

To remove expired external references, NDS builds a list of unused external references by checking the life-span interval of each external reference. This interval defaults to eight days and thirty minutes.

NDS checks to see if the file system must access any of the external references. This process then deletes any external references not accessed by the file system within the life-span interval. The Janitor process then purges the deleted external references.

Synchronizing External References. When NDS updates entries and partitions, it also must update external references created for those entries.

After successfully synchronizing all the replicas of a partition, NDS checks any entry that has been renamed, moved, or deleted. All of these processes involve the Remove Entry process, which adds an obituary on the object being removed. If NDS finds any back link obituaries, it notifies the server that contains the entry's external reference to update that external reference.

Back Links

When NDS creates a new external reference, it also attempts to create a pointer to the server holding the external reference as one of the attributes of the nonlocal entry. This pointer is called a back link and is stored as a Back Link attribute. If NDS is unable to create the back link, it continues trying to create the link 9 times. The default retry interval is currently 3 minutes. If NDS cannot create the back link after 9 times, the Back Link process creates the back link.

NDS uses back links to update external references in the cases where the real object has been renamed or deleted.

The Back Link process executes on a time interval set by the network administrator. Currently, the default interval is 13 hours. The Back Link process has two basic functions:

Remove any expired and unneeded external referencesfrom the system.
Create and maintain any back links not created at the same time as the external reference.

Creating a Back Link. When NDS creates a new external reference for an entry not stored on the local server, NDS attempts to place a back link on the real entry. The back link points to the server that holds the external reference. For example, in the tree in Figure 5, partition Provo is stored on server NS1. Because partition Admin is stored on server NS2, which doesn't store a copy of Provo, server NS2 needs an external reference for partition Provo to connect partition Admin with [Root]. When NDS creates the external reference, NDS places a back link on server NS1's copy of entry Provo. This back link points to NS2.

In this case, server NS2 sends a Create Back Link request to NS1, which places the back link as an attribute value for the entry Provo.

Deleting a Back Link. When NDS removes an external reference, the back link to that external reference must be deleted. The server holding the external reference requests that the server holding the real entry delete the back link, and the server holding the back link then deletes the reference.

Figure 5: Backlinks.

Obituaries

In a distributed database, each server receives updated information through synchronization. Because the servers do not receive updates simultaneously, the servers may not hold the same information at a given time. For this reason, each server holds on to the old information until all the other servers receive updates. NDS uses obituaries to keep track of such information.

For example, Figure 6 shows how obituaries are used when an entry is renamed. On server 1, the entry C is renamed to D. When server 2, which holds a replica of C, receives the update during synchronization, it keeps the copy of C and attaches a New RDN obituary to it. This obituary ensures that all servers can access C, even if they have not been notified of the name change. When server 2 creates entry D, it attaches an Old RDN obituary pointing back to the original object. After all replicas have been synchronized, server 2 can delete its copy of C and remove the obituary from entry ID.

Figure 6: Obituaries.

Obituaries are attribute values that are not visible to clients and are used in server-to-server exchanges. Because obituaries are attribute values, NDS synchronizes them the same way it synchronizes other values. NDS synchronizes all obituaries across partition replicas.

Primary and Secondary Obituaries. The Back Link obituary is considered a secondary obituary. It keeps external references synchronized with the real entries. All other obituaries are primary obituaries, which keep track of entry-level modifications, including:

Renaming an entry
Deleting an entry
Moving an entry
Moving a subtree

Generally, when data is changed, primary obituaries convey the change(s) to servers holding the affected entry. Secondary obituaries convey the change to servers holding external references to the changed entry.

DNS NDS Personality

Because DNS is a name service, DNS-NDS is also a name service. It uses the NDS database and can resolve DNS names. A DNS-NDS server must rely on NDS for any other services, such as authentication.

The DNS-NDS product allows two levels of integration with NDS:

Autonomous
Partition-based

Autonomous DNS

Autonomous DNS uses the NDS database, but is otherwise independent of the NDS hierarchy. It can use some NDS features, such as aliasing, but it does not have to.

DNS names basically consist of three different "zone" levels. DNS-NDS uses NDS's hierarchal nature by modeling its NDS database structure around the three DNS zones. In order to do this, DNS-NDS defines several NDS objects and defines the relationship between these objects.

The primary difference between NDS names and DNS names is the fact that NDS objects making up a DNS domain are not named by the DNS entities they represent. Instead, the DNS domain names are stored in the corresponding NDS objects as attribute values.

The following four NDS object types are required to support an autonomous DNS zone:

DNS Container Object
DNS SOA Object
DNS Unlabeled RR Object
DNS Labeled RR Object

DNS Container Object. The NDS Container object implements the first level of the DNS zone description. Basically, a DNS Container object correlates to a DNS SOA RR. The DNS Container object also includes a member list of all the DNS-NDS servers that support its zone.

Each DNS Container object is a standalone entity within the NDS tree. That is, the NDS name space hierarchy is not represented via the NDS hierarchy. Note that this way, multiple DNS domains can be represented within NDS because each domain is represented by a separate, independent DNS container object. Also, a single NetWare server can support multiple DNS domains by concurrently supporting multiple DNS container objects.

DNS RRset Object. Each DNS RRset is represented by a DNS RRset container object. An RRset has subordinate RRs. NDS represents these subordinate RRs as leaf object inside the RRset container object. The SOA RRset includes NS RRs that identify peer name servers that are authoritative for the zone.

DNS RR Leaf Object. The DNS RR object contains the RR type and RR data of a single RR. As the name states, this is always an NDS leaf object. When a collection of these objects are related to their superior RRset Object, they make up an Rrset. Instead of using the DNS zone transfers to update its servers, DNS-NDS exploits NDS replica synchronization.

The DNS zone-transfer method relies on the concept of primary and secondary servers, where secondary servers request zone transfers (or updates) from the primary server. Because DNS-NDS servers use NDS replica secondary, it has no concept of primary and secondary servers. Therefore, when DNS-NDS is secondary to a non DNS-NDS server (as is the case when a UNIX host maintains the DNS master server), the DNS-NDS server will request zone transfers from the primary server. If a collection of DNS-NDS servers are secondary to a primary server, they will use replica synchronization among themselves and one server will be identified as the designated secondary. This server will handle any zone transfer request between the DNS-NDS collection and the zone master.

To coordinate the zone transfers between DNS and DNS-NDS servers, each change will have a serial number associated with it. The servers can check the serial numbers to see if their database is out of date and, thus, request a zone transfer.

When DNS-NDS is primary to a non DNS-NDS server, the DNS-NDS server will the report the zone serial number to its secondary servers. If several DNS-NDS servers are primary to non DNS-NDS servers, one DNS-NDS server will be identified as the primary server and will maintain the serial number.

Aliases. DNS can also use the NDS alias feature when resolving a domain name to an IP address. When a client request the address of a domain name, the DNS database can check if the correlating NDS entry is aliased to another location in the NDS tree. If it is, DNS can dereference the alias and retrieve the IP address of the target.

Partition-based DNS

This integration level merges the DNS and NDS name spaces. In this level, DNS reaches directly into the NDS tree in order to resolve DNS queries.

In partition-based DNS, a zone level corresponds to an NDS partition instead of an NDS object as it does in the autonomous DNS. The NDS partition root equates to DNS zone's root. This allows partition-based DNS to map DNS requests into the NDS tree within the designated partition.

NDS keeps any overhead information DNS requires in a DNS container object. This object does not represent a zone.

Because a secondary server using partition-based DNS would have to be administered by both the DNS primary server and the NDS administration, thus eliminating NDS's single point of administration, partition-based DNS will not take the role of a secondary.

A migration utility will allow the administrator to easily migrate an autonomous DNS installation to a partition-based DNS.

Merging the Name Spaces

Integrating DNS and NDS simplifies administration between the two name spaces. This benefit is greatly enhanced when the two name spaces are merged. Figure 7 illustrates how NDS and DNS name spaces correspond to each when the two name spaces match.

Figure 7 illustrates how the partition based approach makes use of the NDS tree. In this example, the NDS tree is subdivided into two partitions. Consequently, there are two DNS zones, one for each partition.

Figure 7: DNS Name Space.

The root of Partition A is the node sjf.novell. The root of Partition B is the node novell. The NDS tree and the DNS tree diverge only at their respective roots, [novell_inc] and .com. A federated partition resolves this divergence. The new federated partition external reference (essentially, its tree root) is named [com]. Below this is an Alias object named novell. This effectively allows DNS naming without having to provide any NDS to DNS name mapping within the zone descriptions.

Since the DNS container object can exist anywhere within its corresponding partition, dns2 could be a child of "provo" instead of "Novell", if the tree administrator so desired.

Figure 8 illustrates how tightly coupled DNS-NDS makes use of the NDS tree when the DNS and NDS name spaces do not match up well.

This example starts with the same NDS tree as was used for the first example. However, since the name spaces do not match up, a federated partition must be created that reproduces the DNS name space, not just supplies DNS root name mapping.

Figure 8: DNS and NDS name spaces.

The new federated partition may be further partitioned according to the existing zone layout of the DNS implementation for the company.

In the example, the original NDS tree consisted of two partitions. Note that there is no requirement to have the new DNS zone partitions match one-for-one with the original NDS partitions. For example, the original NDS tree could consist of a single partition, while the DNS subtree consists of two partitions.

Once the new federated partition is created, aliases may be created between the original NDS tree and the DNS federated partition. In the example, mail-1 aliases to sjf-mail, and mail-srvr aliases to provo-mail.

Over time, the NDS administrator can move objects from the original tree into the new DNS federated partition. Eventually, the original NDS tree could even be decommissioned. At that point, a single NDS tree exists which represents a merged name space.

LDAP

Novell's LDAP server agent is based on University of Michigan's SLAPD version 3.3 implementation and provides LDAP version 2 access as specified in RFC 1777 to NDS including extensions for referrals. This means Novell's LDAP supports simple bind (unauthenticated access and access where the LDAP client provides name and password), search, modify, add, delete, modify RDN, compare, abandon and unbind requests. In addition, Novell's LDAP does not support clients if they are not also running LDAP version 2. Because Novell's LDAP is an NDS personality, Novell's LDAP must run on a server where NDS also is running.

Connectionless LDAP (CLDAP) will be supported for unauthenticated access over UDP. CLDAP supports CLDAP v 2 as specified in RFC 1798. Initially, Novell's LDAP supports at least 100 clients.

NDS/LDAP Schemas

The LDAP schema (based on X.500) and the NDS schema (X.500-compatible) are mapped to correspond to each other, allowing an LDAP client to browse the NDS directory. LDAP administrators can configure the mapping between these two schemas.

Novell's LDAP can browse an NDS container if the LDAP agent has browse rights to that container.

Configuring Novell's LDAP

Novell's LDAP is configured with the NWAdmin tool. Initially, configuration will reside in a file on the local file system. In subsequent releases, Novell's LDAP's configuration will reside in NDS. Novell's LDAP is dynamically configured so that the parameters can reset without having to restart it.

Protocol Compatibility

Novell's LDAP relies on the TCP/IP sockets interface. Novell's LDAP will also run on the WinSock interface to Windows NT.

Searching

To provide fast and efficient searches, Novell's LDAP uses a local data store as a catalog and a dredger to maintain that data store. Novell's LDAP supports approximate matching.

* Originally published in Novell AppNotes

Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.