High Availability Networking with NetWare 6: NSS 3.0 and Cluster Services 1.6
Articles and Tips: article
Senior Research Engineer
01 Oct 2001
One of the key benefits of Novell's NetWare 6 is its ability to provide high availability network services. The two main features here are Novell Storage Services 3.0 and Novell Cluster Services 1.6, both of which are included with NetWare 6. This AppNote provides technical detail on how these two services can work together to give your users non-stop access to network resources and data.
file system, high availability, NetWare features, Novell Cluster Services, Novell Storage System
NetWare 6, Novell Cluster Services
network installers and administrators
familiarity with NetWare
Novell Storage Services (NSS) is the next-generation storage/access system developed by Novell. It is the underlying technology upon which Novell is basing both its pumped-up releases of existing file systems as well as completely new storage interfaces and products.
Novell Cluster Services v1.6 is a server-clustering system you can use to ensure high availability and manageability of critical network resources, including data (volumes), applications, server licenses, and services. It is a multi-node, eDirectory-enabled clustering product that supports failover, failback, and migration (load balancing) of individually managed cluster resources.
This AppNote provides technical detail on how these two NetWare 6 services can work together to give your users non-stop access to network resources and data.
Novell Storage Services 3.0
This section looks at NSS 3.0. As an integral part of NetWare 6, NSS catapults NetWare to incredible new heights by providing quick access to data by reducing volume mount times to under a second and reducing volume repair times to under a minute; improved memory utilization and more efficient disk space management; the ability to store huge files; and a superior return on investment.
Evolution of the NetWare File System
As good as it was, the traditional NetWare file system had some limitations. Chief among these were long file mounts, limited volume sizes, and limited cross- platform support. As storage hardware and technology advanced, size limitations were slowly lifted and users wanted to be able to mount more volumes on a single server. They wanted faster volumes mount times and much quicker error recovery. Along with all of this, they didn't want to give up the reliability and capabilities of the traditional NetWare file system.
NSS was Novell's answer to these needs. First introduced with NetWare 5.x, NSS is revolutionary in that tasks like mounting a volume have become virtually instantaneous, and the amount of storage supported is virtually unlimited. NSS gives you the ability to store large objects and large numbers of objects without degrading system performance. It provides extremely fast access to your data. NSS allows volumes to be mounted and repaired in seconds rather than the hours it would take with NetWare's traditional file system. And you get all of these benefits while maintaining full backwards compatibility with classic NetWare.
Benefits of NSS 3.0
This section describes the benefits of NSS in more detail.
Quick Access to Data. Let's assume that an intense electrical storm hits your site. Unfortunately, you neglected to purchase that Uninterruptible Power Supply (UPS) you'd been planning on buying for months. The power goes off for a couple of minutes. Afterwards, when you reboot your server, one of the huge server volumes needs to be repaired. With the traditional NetWare file system, running VRepair could take hours to complete, since the amount of time required to mount a volume is related to the size of the volume. With NSS, repairing an NSS volume only takes minutes, regardless of size. Thanks to NSS and its advanced journaling algorithms, volumes can be repaired quickly by replaying uncommitted changes rather than scanning all the files on a voloume, as VRepair did.
Improved Resource Use. Consider a smaller-sized company with a stingy hardware budget and an enormous new Web site to bring online. Imagine a server with a limited amount of RAM available. It's entirely possible that the volume containing the Web site files won't load because the server doesn't have enough memory to cache the entire directory entry table (DET).
NSS solves this and similar memory management problems by running on virtually any amount of memory you have available. NSS mounts any size volume with as little as 4 to 10 megabytes (MB) of memory. NSS lets systems with limited resources perform better, while larger systems provide even higher performance.
NSS provides more than just improved memory management. Sophisticated data management techniques let NSS make more efficient use of available disk space as well. NSS lets multiple name spaces share storage space rather than using additional storage for each version of an object's name. NSS also stores objects in balanced trees (B-trees) for faster storage access.
Handles Large Objects and Large Numbers of Objects. NSS can scale to store up to 8 terabytes (TB) of data. Since NSS uses a 64-bit interface, combined with advanced algorithms to manage the storage system, Nit can provide virtually unlimited number of directory entries and files, without degrading system performance. Gone are the days of needing to add volumes to your server and realizing that you have already maxed out the number of volumes the server can support.
Return On Investment. There are no hidden costs associated with upgrading to an NSS storage system. No new hardware is required and you needn't purchase additional memory. Rapid volume mount and repair times mean NSS will provide savings in increased administrator and user productivity. Best of all, your needs can't outgrow the system. The modular structure of NSS lets you add new functionality as technology advances and your needs change.
Structure of NSS
This section describes the internal NSS structure and details how the benefits provided by NSS are achieved.
Figure 1 shows the four basic sections of the NSS system. They are the Media Access Layer (MAL), the Object Engine, the Common Layer Interface, and the Semantic Agents.
Structure of NSS.
Let's discuss each of these layers in a little more detail.
Media Access Layer (MAL). The MAL provides connection to a wide range of storage devices such as standard hard drives, CD-ROMs, Digital Versatile Disk (DVD) media, virtual discs implemented as networked clusters, and even non-persistent media such as RAM disks. The MAL lets you view the storage capabilities of your server as simply a quantity of storage blocks, freeing administrators from the details of enabling various storage devices. The MAL's modular design allows new devices and technologies to be easily added. The MAL also provides the interfaces used by the Object Engine to interact with the available storage devices.
The Object Engine. The Object Engine layer is the NSS object storage engine. This engine differs from traditional object engines by providing significantly higher levels of efficiency. The NSS Object Engine uses sophisticated and highly efficient mechanisms to manage the objects it stores, achieving high levels of performance, scalability, robustness and modularity.
Performance. To improve system performance, the Object Engine stores objects on disk in balanced trees (sometimes called B-trees). Using the compact B-tree structures guarantees the system can retrieve an object from the disk in no more than four I/O cycles. B-trees also improve memory management by letting the system locate an object anywhere in storage without loading the entire directory entry table into memory.
The ability to share name spaces also improves disk space usage. Instead of storing a name for each name space in a single stored object (such as one name for DOS and another for UNIX/NFS), the name spaces in an object share a common name, if no naming conflicts exist.
Scalability. The Object Engine uses 64-bit interfaces to let you create far more objects and individual objects far larger than was possible in the traditional NetWare file system.
Robustness. To enable rapid volume remounts after a crash, the Object Engine maintains a journal that records all transactions written to disk and all transactions waiting to be written. The traditional NetWare recovery procedure involved using the VRepair utility to laboriously check and repair inconsistencies in the directory entry tables. The NSS Object Engine can locate an error on a disk by referencing the transaction journal, noting the incomplete transaction, and correcting the error by either reprocessing the incomplete transaction or by backing it out-all without having to search the volume.
Modularity. The Object Engine's modularity lets you define new objects and plug them into the storage system as needed. New storage technologies, such as DVD, can be transparently plugged in to the engine at any time without affecting the system. This modularity lets you make use of hard links, symbolic links, and authorization systems not previously available through the traditional NetWare file system.
Common Layer Interface. This layer defines the interfaces the Semantic Agents use to access the underlying Object Engine. These services fall into three basic categories: naming services, object services, and management services.
Naming Services. These services include basic object naming and lookup operations as well as name space management services.
Object Services. These services provide the standard and direct input and output to and from objects, as well as other operations on objects themselves, such as create, delete, and truncate operations.
Management Services. These services cover a variety of tasks, including locking, managing volume operations, and the addiiton and registration of new objects.
Semantic Agents. The Semantic Agent layer contains loadable software modules that define the client-specific interfaces available to store objects. For example, the NetWare file system Semantic Agent interprets requests received from NetWare 6 clients and passes the requests to the Common Layer Interface and onward to the Object Engine for execution. Another Semantic Agent implements an HTTP interface, allowing Web browsers to access data also stored by the Object Engine. Additional Semantic Agents support other popular systems such as NFS, Web Proxy Cache, and the Macintosh file system.
This modular approach means you no longer need separate storage solutions for different storage systems. New Semantic Agents can be created and loaded to the system at any time, without impacting any currently-loaded Semantic Agent.
Novell Cluster Services 1.6
Do you remember NetWare SFT III, the Novell technology that provided a complete redundant server running in synchronization with the main server? If for any reason the main server failed, the backup server would take over for the failed server without missing a beat.
Novell's server clustering has taken this technology a giant step forward. First introduced for NetWare 5.x, Novell Cluster Services has been updated and enhanced for NetWare 6. This section discusses Novell Cluster Services 1.6, which has the mission to ensure high availability of critical network resources, including connection licenses, data volumes, network services, and applications.
A cluster is a group of file servers, in which each server is referred to as a node. You create a cluster by loading the clustering software onto the NetWare servers that you want to be part of the cluster. The clustering software connects the servers into the cluster. Using this software, you can have as many as 32 servers in a cluster. (NetWare 6 includes a two-node clustering license.) Typically, after creating the cluster, you would connect it to a shared storage system by way of a Storage Area Network (SAN).
Novell Cluster Services 1.6 uses the concept of failovers to ensure the high availability of network resources. A failover occurs when one node in a cluster fails and one or more surviving nodes take over and continue to provide the failed node's resources. When a failover occurs, users typically regain access to their resources in seconds, usually without having to log in again.
Cluster Services provides a great deal of versatility in the way you distribute resources. The product lets you determine what you want to happen if a node fails. For example, you may specify that you want all of Node X's resources to migrate to Node Y in the event that Node X fails. Or you may also specify that some of Node X's resources migrate to Node Z to better balance the workload.
Typically, failovers occur automatically when a node unexpectedly fails. However, it is also possible to manually invoke a failover when you want to load- balance the cluster, or when you need to bring down a server for maintenance or hardware upgrade.
Novell Cluster Services Architecture
The architecture of Novell Cluster Services 1.6 is different from that of Cluster Services for NetWare 5.x. However, both versions are designed to ensure high availability and to simplify storage management.
Figure 2 shows the modules that make up Novell Cluster Services 1.6.
Architecture of Novell Cluster Services 1.6.
The following sections briefly explain what each module does.
Cluster Configuration Library (CLSTRLIB). This is the configuration libary module for Cluster Services.
Group Interprocess Communication (GIPC). The GPIC is responsible for group membership protocol, including the heartbeat protocol. The cluster nodes transmit and listen for heartbeat packets on the network at regular intervals. By doing this, nodes can detect possible failures when one or more nodes fail to transmit their heartbeat packets. The nodes also use the heartbeat protocol to write to a special partition on the SAN.Virtual Interface Provider Library Extensions (VIPX). VIPX is Novell's extension of the provider library for the Virtual Interface (VI) Architecture specification. The VI Architecture specification defines an industry standard architecture for communication between the clusters of servers and workstations.
Cluster System Services (CSS). The CSS module provides an API that any distributed cluster-aware application can use to enable distributed-shared memory and distributed locking. Distributed-shared memory allows cluster-aware applications running across multiple servers to share access to the same data as though the data were on the same physically-shared RAM chips. Distributed locking protects cluster resources by ensuring that if one thread on one node gets a lock then another thread on another node can't get the same lock.
Split Brain Detector (SBD). This module protects against unnecessary failovers when a node simply loses its network connection. If a node becomes unable to communicate with the network, it can no longer send or receive heartbeat packets. As a result, the other nodes in the cluster think that this node has failed and attempt to take over the presumably dead node's resources. Meanwhile, since the node is not dead but has merely lost its network connection, it thinks it is the only node alive in the cluster and thus tries to restart the cluster's resources by itself.
The SBD module detects this kind of problem and notifies the cluster, which immediately tries to deactivate one side of the "split brain." The cluster will deactivate either the smaller half of the split brain or the half that is not running the master node.
Since the Cluster Services included with NetWare 6 provides only a two-node license, each half of the cluster "brain" consists of only one node. Thus neither half is smaller and each half would think it is the master if the network connection between the nodes is lost. Cluster Services 1.6 addresses this potential problem with technology to detect network failures and deactive the node that has the network failure.
Portal Cluster Agent (PCLUSTER). This module provides the ability to manage Clustering Services from NetWare Remote Manager. Now Clustering Services can be managed from any computer with a browser and Internet connection. The functionality in Remote Manager is practically identical to the functionality in ConsoleOne. Now you have two different ways to manage Cluster services.
Virtual Interface Architecture Link Layer (VLL). This is an interface layer for several other Clustering Services modules. The GIPC, SBD, and CRM modules interface in the VLL. If the GIPC module stops receiving information from one of the cluster nodes, it notifies the VLL module. The VLL module then contacts the SBD module, which determines if the node is really dead or not, and then informs the CRM of the decision.
Cluster Resource Manager (CRM). This module keeps track of all the cluster's resources and where they are running. It also is responsible for restarting resources in the event of failure. The CRM executes the failover policies specified in the NDS configuration data that the CLSTRLIB module reads into local memory when you install Cluster Services.
Cluster Management Agent (CMA). This module interacts with the Clustering snap-ins to allow ConsoleOne to manage Novell Cluster Services.
Cluster Volume Broker (CVB). This module keeps track of the NSS configuration for the cluster. If a change is made to NSS for one server, the CVB ensures that the change is replicated across all the nodes in the cluster.
Templates simplify the process of creating resource objects. Novell Cluster Services 1.6 includes new templates for the following resources:
Novell Internet Messaging System (NIMS)
File Transfer Protocol (FTP)
Network File System (NFS)
After you have created the resources, Cluster Services 1.6 allows you to establish a priority in which they execute. In this way, you can ensure that a protocol, such as DHCP, loads before an application such as GroupWise.
In addition, the process of creating cluster resources has been simplified. Using Cluster Services 1.6, you simply check the Online Resource After Create option when you are creating a new resource. The cluster automatically brings the new resource online when it creates that resource.
Novell Cluster Services 1.6 includes several new utilities to help you maintain a stable system. One such tool is a persistent cluster event log, which logs cluster events to a file. The log file is viewable from either ConsoleOne or NetWare Remote Manager.
A new heartbeat tool allows you to view and tune heartbeat settings on both the network and the SAN. For example, you can modify the default eight-second heartbeat threshold to suit the requirements of your network. The nodes in a cluster send out heartbeat packets every second. If a node doesn't send out a packet after a default threshold of eight seconds, the other nodes suspect there is a problem and take appropriate action.
SNMP and SMTP Support
Novell Cluster Services 1.6 supports Simple Network Management Protocol (SNMP) through Management Information Base (MIB) extensions developed by Compaq.
Additionally, Cluster Services 1.6 supports Simple Mail Transfer Protocol (SMTP). This enables you to set up Cluster Services to send messages to up to eight e-mail addresses when monitored cluster events occur. These events might include a node failure, a node being taken down, or a new node joining the cluster. You can also be notified whenever the status of a cluster resource changes. This e-mail feature allows 24x7 notification of your network's status.
The e-mails sent by Cluster Services can be either plain text or XML-formatted messages, depending on which you select. The XML format is planning for the future, since NSS provides an XML management interface. Theoretically, Cluster Services could send an XML-encoded e-mail describing a problem, which NSS could interpret and automatically adjust the file system to correct for the problems in the cluster.
NSS and Clustering
In NetWare 6, Novell has integrated NSS 3.0 and Cluster Services 1.6 to better support each other, compared to the same offerings in NetWare 5.x. Now NSS and Cluster Services work together to take advantage of shared storage devices. (Shared storage devices are those which every cluster node has access to, such as SANs, as opposed to the traditional per-server method for accessing storage in NetWare networks.)
NSS 3.0 provides a way to flag storage as sharable for clustering. With the initial release of NetWare 6, you need to set this flag via the ConsoleOne Clustering snap-ins. In the future, Novell plans to have the software automatically detect and flag shared storage devices.
Note: When NSS 3.0 detects a "sharable for clustering" flag, it will not activate the attached storage unless Cluster Services 1.6 is also running. Typically, NSS only activates storage that is local to the server on which it is running.
You might be wondering whether more than one node in a cluster can write to the same shared storage pool simultaneously. The answer is no; Novell Cluster Services 1.6 only allows one node to use a shared storage pool at a time. Data corruption would most likely occur if two or more nodes had access to the same shared storage pool simultaneously.
Cluster Services uses the term pool to refer to an area of storage space that you create using the free space available on your storage devices. Storage pools are containers for logical volumes. The main advantage of logical volumes is that they can increase or decrease in size, using only as much space as the files and directories stored on them take up at the present time.
Cluster Services 1.6 also protects your data by failing-over pools, rather than volumes as Cluster Services for NetWare 5.x did. If a shared storage pool is active on a node when the node fails, the cluster automatically migrates the pool to another node. The clustering software reactivates the pool and remounts the cluster-enabled logical volumes within that pool.
Whether you set up pools containing only one volume or lots of volumes, you will need to cluster-enable the volumes held by the storage pools. Cluster-enabling volumes ensure that users' access to the volumes is uninterrupted during a failover. When you cluster-enable a volume, the clustering software creates a virtual server on which to mount the volume. The virtual server guarantees that the volume remains usable to clients regardless of the cluster node on which the volume is mounted. Novell Cluster Services 1.6 allows you to choose the names of the virtual servers, whereas NetWare 5's Clustering Services chooses the names for you.
The Complete High-Availability Storage Picture
From what we have discussed so far, you can see that the power of Novell Cluster Services 1.6 can be enhanced even further by using it in conjuction with NSS and SANs. Other features of NetWare 6 add even more value.
For example, Novell's Native File Access allows you to have Windows, Macintosh, and Unix workstations accessing NSS 3.0 directly using their native file access protocols (Common Internet File System, AppleTalk, and NFS, respectively). The benefit here is that no NetWare client software is needed on these workstations. Users of these systems access NSS 3.0 data using the native network access software they are accustomed to using. For example, Windows users can access a NetWare 6 server using TCP/IP. You can also have Windows, Macintosh, and Unix workstations running Novell Clients accessing the network the traditional way (see Figure 3).
Typical network setup using clustering and SAN technology.
Because NetWare users are represented as eDirectory objects, it is very easy to manage access to stored data. Management is further simplified by letting your SAN be accessed solely by Novell Cluster Services. In this configuration, you do not have to guess how much storage to allot to each operating system-NetWare uses all of it, and the data is available to all of the desktops on the network. When you need more storage, you simply add it to the SAN, and NetWare automatically makes use of it.
Novell Cluster Services 1.6 attached to a dedicated SAN and utilizing NSS 3.0 presents an attractive solution to organizations that need constant access to their network and data resources. Setting up volumes with NSS 3.0 is fast and simple, as is the setup of Cluster Services 1.6. Your data remains intact and always available in the shared storage pools you create, and backing up the data is greatly simplified. You can maintain and manage the cluster via NetWare Remote Manager-all you need is an Internet connection and a Web browser. What's more, you can be notified by e-mail of any serious problems with the cluster.
Together, NetWare 6, NSS 3.0, and Novell Cluster Services 1.6 provide a proven high-availability storage solution you can count on.
* Originally published in Novell AppNotes
The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.