Novell is now a part of Micro Focus

Backup Backup: Clustering with Novell Open Enterprise Server

Articles and Tips: article

Michael Wilkinson

01 Feb 2005


Have you ever heard the title of CIO referred to as "Chronic Information Outage?" If so, then you have probably been caught more than once with no fail-safes in place. If you're in IT, you've spent hours and hours running from fire to fire, dealing with downed servers, failed applications and network-created user issues. Your budgets are drained by overtime compensation and productivity levels never seem to rise. Novell Open Enterprise Server, which includes Novell Clustering Services, provides the needed fail-safes so you can make the move from a reactive to a proactive state.

Simply stated, clustering servers facilitate high availability--availability of data, applications and services. Clustering, once only available to NASA and mega corporations, no longer requires a degree in rocket science to implement.

The benefits of clustering include:

  • Increased availability of data, applications and services with five 9's or more of reliability

  • Increased productivity with less downtime

  • Improved performance with shared load

  • Lower cost of operations with centralized management and less expensive hardware

  • Minimized service outages

  • Reduced system costs through server and storage consolidation

You get all these benefits when using Novell Cluster Services. Novell Cluster Services is a server clustering system that ensures high availability and manageability of critical network resources including data (volumes), applications and services. It is a multinode clustering product available on both Linux and NetWare platforms that is enabled for Novell eDirectory and supports failover, failback and migration (load balancing) of individually managed cluster resources. Novell understands the value that clustering technologies provide to companies both large and small. That's why Novell Cluster Services are included as a core component of Novell Open Enterprise Server.

Basic Clustering Unclustered

To implement clustering requires three decisions:

  • Services or applications to be clustered: What are your single points of failure?

  • Level of redundancy: How many additional failover levels or levels of redundancy should you provide?

  • Failover priority: What is the failover order sequence or policy priorities in the event of multiple failovers?

A single point of failure, or SPOF, is the weak link in a system. It is that part of your system which, upon failure, causes an interruption or failure in the entire system. A SPOF in your network affects the availability of data, applications or services. You can eliminate a SPOF in your network by implementing redundant system components, (e.g., hard drives, NICs, controllers) or fully redundant clustered systems. The following scenario helps illustrate these basic clustering concepts. Suppose your company has an inside-sales staff that enters phone orders into a Web-based, order-entry database. When service is interrupted because a server or application fails, productivity goes down and your company loses money. Your initial configuration may look like the diagram in Figure 1.

Figure 1

The configuration is one single Novell Open Enterprise Server running Apache Web server, the custom order-entry application and the MySQL database application. There is one Novell volume called DATA where the physical MySQL data tables are stored.

To reduce interruption and improve availability, you can implement a two-server cluster by installing a second Open Enterprise Server. Novell recommends you implement a shareddisk system (RAID) or a storage area network (SAN) to provide high availability of data--in this case the database tables. This new configuration would look like the diagram in Figure 2.

Figure 2

You install a second Novell Open Enterprise Server with all three applications (Web server, order-entry and database). Create a shared-disk system using the Novell Storage Services (NSS--also available with OES). Using the NSS management utility, connect the two servers to the shared-disk system by adding Fibre Channel Host Bus Adapters to each and connecting them via a Fibre Channel switch. In this example, the shared-disk system is configured to take over the Novell volume called DATA, and should be implemented into a RAID configuration for fault-tolerance and reliability. We'll discuss other less-expensive options for server connectivity to shared-disk systems later.

Once you configure the two servers to share the same disk system, you can install Novell Cluster Services on both systems by running the NetWare Deployment Manager and selecting the Create New Cluster option. A unique cluster name is required and corresponds to a cluster object that you'll create in eDirectory. Add the individual server names to the cluster and if there are more than two servers, repeat the steps to add all the servers.

Once you identify the servers in the cluster, assign a unique IP address for the cluster. The cluster IP address is different from any server IP address and must be different than any IP address in the network. The cluster IP address is used for cluster management, and will always be bound to the server that becomes designated as the primary or master server in the cluster. Create a corresponding cluster resource object for this cluster IP address in eDirectory.

In the next step, you'll determine whether the cluster has a shared-disk system and if so, where it's located. Create a small cluster partition on the shared-disk system which requires 10 MB of available space on one of the shared disks. This space cannot be a part of any NSS partition, so set that space aside when creating the shared-disk system.

Note: OES includes licenses for two servers in a cluster. If you need more than two servers, you'll need an additional license and the license configuration screen appears next. Skip this step for now if the licenses aren't available and install the licenses manually later. Use Novell iManager to configure the licenses and make the cluster functional.

During cluster configuration, create all of the cluster objects in eDirectory and install the Novell Cluster Services software on all servers in the cluster. Once you install Novell Cluster Services, create resource objects for each of the three applications using iManager, ConsoleOne or NetWare Remote Manager. Regardless of the management interface, the creation process for each resource object is similar.

In the first step, create a new resource object that represents the first application and then associate it with the cluster object. Use a resource template, if one exists, to create the cluster resource. Resource templates simplify the creation of cluster resources and Novell provides three out-of-the-box templates which we'll discuss later. The components required for a cluster resource, whether they come from a template or are manually assigned, include the load script, the unload script and the settings for the failover and failback modes. The load script provides the specific startup details for the particular application. The load script will be invoked by the first server when you start the application for the first time and by the second server (and subsequent servers in larger clusters) upon failover. You can use standard NCF-style commands (Novell command file) in the load and unload scripts.

The unload script includes the shutdown details for the application and is generally used when manually failing-over an application. This allows the application to shut down cleanly before exiting. During a hardware or application failure, the unload script is never called because the application has already exited and the server may be completely down. Failover and failback settings include rules that determine whether applications will manually or automatically start up, fail over and fail back between systems. In our order-entry example, we would create three cluster resources that represent Apache, MySQL and the Web application.

Once you create any cluster resource, all of the nodes in the cluster are automatically assigned as failover nodes for that resource. The order of failover is the order that the nodes appear in the resource list for the clustered resource. You can change the resource list order and you can remove or add servers to the list as needed. In this example, we'd assign the three clustered applications to run on the primary server and to start up on the secondary server upon failure. The resulting configuration would look like the diagram in Figure 3.

Figure 3

When the two-server cluster is up and running, a health check or "heartbeat" runs between the systems which monitors the health status of the different cluster resources. If the heartbeat detects a failure, Novell Cluster Services engages the failover policies.

For example, let's assume that your primary server has a hardware failure. The heartbeat in Novell Cluster Services would determine that the server failed and invoke the load scripts on the secondary server for each of the applications. Novell Cluster Services would also assign the cluster IP address to the secondary server for management accessibility and the shared Novell volume, DATA, would be mounted by the secondary server as shown in Figure 4.

Figure 4

Once you resolve the hardware problem, the application(s) could failback automatically to the first server if you set the failback policies to do so. Otherwise, the applications would continue to run on the second server until you manually failed them back to the first server.

Mixed Clusters

With the release of Novell Open Enterprise Server, customers now have the option of utilizing clusters comprised of NetWare systems, Linux Systems or a combination of both. Using a mixed platform environment provides benefits to customers who want a choice of platform, or who are looking for a gradual migration path of services from one platform to another. In the initial release of OES, the services that are clustered and set to failover from one platform to another must have the same configuration, e.g., Apache, MySQL and a Web application. When migrating services from one platform to another, be sure that the load scripts will correctly load the clustered services on the new platform. An example of a mixedplatform cluster might look like the diagram in Figure 5.

Figure 5

Installation and Management

Installing Novell Cluster Services requires each machine in the cluster to have at least 512 MB of RAM and be networked correctly in a TCP/IP-based network. Each clustered application or shared data volume also needs a unique IP address. For high data availability, you need a shared-disk system and Novell recommends that you configure the disks in the shared-disk system to use mirroring or RAID for greater fault tolerance. These include:/p>

  • Novell iManager Novell iManager is a Web-based administration console that provides customized access to network administration utilities and content from any location in the world, on any device--and it does so securely, whether inside or outside the firewall. (See Figure 6.)

    Figure 6

  • NetWare Remote Manager NetWare Remote Manager is a browser-based utility used to manage one or more NetWare servers from a remote location, and includes the same utilities available at the server console.

  • ConsoleOne ConsoleOne is a Java-based administration tool used to flexibly manage Novell and third-party products on a variety of platforms. Running on either a Windows workstation or a NetWare server, ConsoleOne provides a single point of network administration for resources including Novell eDirectory objects, schema, partitions, replicas and NetWare servers.

Scalable Cluster Services

With Novell Cluster Services, you can easily adjust your storage infrastructure to accommodate business growth. In fact, Novell Cluster Services supports up to 32 servers per cluster with as many as 32 processors per server. You can add servers to a cluster dynamically without interrupting users, and simply switch resources to the new servers to provide optimal load balancing across the cluster. Combining these scalable clusters with SAN solutions ensures that users can always access critical resources and your organization can meet growing business needs.

As said before, two cluster licenses are automatically installed during the Novell Cluster Services installation with Open Enterprise Server. If you have 100 Open Enterprise Servers, and two-node clustering provides sufficient reliability for your applications and data, you could build 50 different two-node clusters with no added cost for licensing. If you need larger clusters for higher levels of availability, you'd need incremental licenses.

Enterprise Cluster Features

It takes more than just a heartbeat-based failover product to provide enterprise-class cluster services. Novell Cluster Services includes additional enterprise-ready features such as cluster resource templates, e-mail notification of failover, priority of service failover, service monitors, rolling upgrade capabilities, manual failover, failback policy, and support for Fibre, SCSI and iSCSI-connected disk systems.

Cluster Resource Templates

Cluster resource templates simplify the process of creating similar or identical cluster resources. These templates supply the load and unload scripts and basic cluster configuration information for new cluster resources when they are created. You can create templates for any server application or resource, and can always edit and customize them. For example, if you want to create the same MySQL resource on five servers, you could create or use an existing MySQL template to quickly do this. Novell Cluster Services includes three generic templates: DHCP, MySQL and a generic IP resource service. Use the generic IP resource service to create a NetStorage cluster resource.

E-mail Notification of Failover

Like most administrators, you probably like to be proactive and know what is going on in your networks at all times. When something goes wrong, you want to know about failures before outages affect network users. You can configure Novell Cluster Services to automatically send out e-mail messages when certain cluster events occur. These cluster events include resource state changes (failover) and changes in cluster membership, such as when a node enters or leaves a cluster. You can configure up to eight different e-mail addresses for e-mail notification when these events occur.

Priority of Service Failover

Often, certain applications and services are more important than others. For example, an e-commerce Web application might be more important to you than e-mail services. If the e-commerce site goes down, the result is lost revenue and fewer customers. Downtime for some applications can cost companies millions of dollars. Novell Cluster Services gives you the option of setting the priority of service failover when there are multiple clustered applications running on a single server. This priority specifies the order in which the applications will be started on the new server. In the earlier example with two clustered servers running Apache, MySQL and a custom Web order-entry application, we could set which application should start first, second and third on the failover server. This ensures that the most important application starts first. This is especially important if one application has a dependency on another before it can load. For example, the Web order-entry application may require Apache to be running before it can load correctly. If so, you should set the priority for Apache higher than that of the Web application.

Service Monitors

Novell Cluster Services includes APIs for developing service specific monitors. For example, a service may still be operating but not responding to users. The standard heartbeat would not detect that the service has failed since the processes are still functioning on the server. An external service monitor can periodically test to see if the service is responding and, if not, initiate a failover of the service to another node. A service monitor for the eDirectory LDAP server opens an LDAP port with the LDAP server and reads an attribute. If the LDAP server does not respond, the eDirectory LDAP service monitor detects the case and initiates the failover. This test occurs on a configurable repeating basis, once every five minutes, for example.

Open Enterprise Server implements Novell Cluster Services using the Linux HeartBeat service monitors and is specifically used for monitoring IP addresses; however, it can leverage any of the other existing heartbeat service monitors. Heartbeat is an open source twonode cluster failover package included in SUSE LINUX Enterprise Server 9. Future versions of Open Enterprise Server will integrate Novell Cluster Services with MON, the open source servicemonitoring daemon used for all types of monitoring and alerts. This will enable an even broader level of support for monitoring existing and future services.

Rolling Upgrade Capabilities

Novell Cluster Services includes the capability to upgrade the operating systems on all of the servers in the cluster, whether NetWare or Linux, with only a minimal amount of disruption. During a rolling upgrade, one server is upgraded to the new version of the operating system, while the rest continue to run the old version. After you complete the first upgrade, upgrade another server in the cluster until all of the servers in the cluster have been upgraded.

When you upgrade the last server in the cluster, all of the resources on shared storage are temporarily put into a new resource state called upgrade. When this happens, take all of the resources in this new state offline while you upgrade the disk media on one server. Then refresh the disk media on the rest of the servers and bring the cluster resources back online. This process happens very quickly, and only affects the clustered services that have shared datasets on clustered volumes. All other services in the cluster will continue to run uninterrupted.

Manual Failover

You might want to force a failover of a particular clustered application or of all the applications on a particular server for a number of reasons: system maintenance, hardware upgrades and service migration to a new system (because of updated software or hardware requirements). Novell Cluster Services makes it easy to manually failover a clustered service using any of the three different management consoles.

Failback Policy

Novell Cluster Services gives administrators the option of allowing a failed cluster service or application to either automatically or manually failback to the primary server or "preferred node" in the cluster when that service is fixed or becomes available again. The preferred node is the first server in the list of the assigned nodes for the resource. Automatic failback is useful when performance is important so that resources are distributed back onto the original servers as initially designed, while manual failback is important for administrators who want greater control of the reliability and control of the cluster services they offer.

Fibre, SCSI or iSCSI?

To provide high availability data to users or applications in a clustered environment, you need a shared data store. You can base connectivity to the data store on Fibre technology, SCSI connectivity or an iSCSI alternative. Each option has pros and cons, but all three will work with Novell Cluster Services.

Fibre technology utilizes fiber optics and FDDI network interface cards. It currently provides the highest performance because of throughput speeds. But special cabling and switch equipment are necessary for this type of connectivity and carry a higher price tag.

SCSI (Small Computer System Interface) represents a set of data access protocols for different device types, for example, disks and tapes. You can use SCSI protocols to create a storage area network between several clustered servers and a high-end storage server. SCSI protocols have distance limitations, requiring the storage server to be in close proximity to the clustered systems. The cost for connecting this type of system is minimal in comparison to the other options because it doesn't require additional interface cards, high-end cabling or expensive data switches.

iSCSI is a transport protocol for SCSI that works on top of TCP/IP. iSCSI eliminates the distance limitation of SCSI, at a much lower cost than implementing a fiber-based solution. Performance is not as fast as SCSI or Fibre, but can be improved significantly by utilizing Gigabit Ethernet network cards. Gigabit cards are more expensive than standard Ethernet, but still much less than FDDI cards. (See Figure 7.)

Figure 7


Cost
Performance
Flexibility

Fibre

High

High

High

SCSI

Low

High

Low

iSCSI

Low/Medium

Medium/High

High

Implementing a Gigabit-based iSCSI solution ranges anywhere from 4--10 times less expensive than a fiber-based solution. Gigabit Ethernet is beginning to approach the performance of Fibre, and many see this as a viable alternative for the connectivity of shareddisk systems.

Determining which data storage option to support depends on your current investment, in-house skill set, budget, physical location of systems and availability requirements. Novell supports any of the methods you choose.

Novell Supports Clustering with Clustered Applications

Many Novell Applications such as iPrint, Virtual Office and NetStorage are cluster-enabled. All of them can run in a mode called Active/Passive while some can also run in a mode called Active/Active. For more information on these modes, see So What's the Difference?

Cluster Conclusion

In today's enterprise environments where data services are shared across functions, organizations and even continents, high availability systems are critical. As someone who is responsible for data, application and service availability, you want to make sure that your application and network services are reliable and do so without busting the budget. Open Enterprise Server gives you the tools to do that and for simple two-node clusters, the cost is nothing. Novell Clustering Services increases productivity and reduces downtime, costs and service outages by making applications and data highly available. Clustering with Novell Cluster Services also increases the control and stability of your network through centralized management of clustered resources.

So What's the Difference?

Active/Passive Mode Active/Passive mode refers to a service configuration where a clustered service on one server is actively processing requests, while at least one server with a failover service sits idle, waiting for failover to occur. While this leads to a greater hardware expense, it provides high availability for enterprise applications because the full resources of the second server will always be available to handle the failover load, if failure occurs.

Active/Active Mode Active/Active mode refers to a service configuration where a clustered service on one server is actively processing requests, while the failover server has its own clustered services running. While this leads to better utilization of hardware, it is important that the servers be sized correctly to handle the full load of new services, in addition to the ones they are already running, should one server fail.

Clustering GroupWise

Can you live without e-mail? Can your company? Sometimes we'd like to shut if off but obviously, we can't. E-mail has evolved from a helpful business communication tool to a mission-critical business application. As GroupWise administrators, you know this. You also know that GroupWise not only provides e-mail, but many other valuable collaboration services which are now mission critical. When applications and services are mission critical, administrator jobs hang in the balance. Using Novell Cluster Services, you can improve the availability of your GroupWise system. Whether your critical link is the constant accessibility between users and their e-mail, the availability of document sharing and collaboration services, or the accessibility of services via Internet e-mail gateways, Novell Cluster Services can help.

GroupWise was developed with scalability in mind. Instead of one massive service that runs on one system, GroupWise is designed as functional services that can be run independently, spread out across multiple systems, minimizing the single point of failure risk. When one service fails, like the Internet Agent, it won't bring down the whole system. Only the services associated with the Internet Agent will be unavailable while the rest of the system will continue to function. Clustering the various services in your GroupWise system can greatly increase the reliability and availability of your collaboration services.

Which Services Should be Clustered?

There is no right or wrong way to implement GroupWise in a clustered server environment. Your implementation may depend on the requirements that your company has regarding resource availability, budget, data accessibility and more. How you implement clustering may also depend on the particular helpdesk issues you are trying to solve. For an in-depth look at GroupWise clustering, see the Novell GroupWise Interoperability Guide.

Let's talk about the value of implementing GroupWise in a clustered environment, and highlight a few things to think about when clustering GroupWise services. Figure 8 illustrates a common GroupWise configuration.

Figure 8

Possible cluster configuration options include:

  • An entire GroupWise system in a single cluster

  • Separate GroupWise services spread across multiple clusters

  • Key GroupWise services clustered while other services are not clustered

If you do not have the system resources to run all of your GroupWise system in a clustering environment, decide which services are most critical and only cluster those. Here are some suggestions:

  • E-mail and e-mail access problems can generate many internal IT help desk calls. Minimizing help desk incidents saves your company money and makes users more productive. To minimize calls, users must be able to access their mailboxes, and post offices and the associated Post Office Agent, or POA, must be available. Therefore, Post Offices and POAs are excellent candidates for clustering.

  • If your company has many remote or mobile users, reliable remote e-mail services are important. GroupWise WebAccess provides user access to GroupWise mailboxes across the Internet through a Web browser. The WebAccess service is another good candidate for clustering.

  • The GroupWise Domain functions as the main administrative unit in a GroupWise system. In addition, Mail Transfer Agents (MTAs) facilitate mail movement between different domains (inside and outside the GroupWise system). Domains and MTAs are less noticeable to users when they are unavailable, but are still critical to the overall system. Critical domains in your system are your primary domain and any additional domains like secondary or routing domains. Consider clustering primary and routing domains, even if other domains are not clustered.

  • The Internet Agent that provides the use of POP3 or IMAP4 clients to GroupWise users and immediate messaging across the Internet is also a candidate for clustering if this service is important in your environment.

Consider a Fan-Out Failover Configuration

Fan-out failover is a configuration where cluster resources from a failed node are split up and fail over to several different nodes to distribute the overall load. For example, if a node runs a cluster resource consisting of a domain and the associated MTA, another cluster resource consisting of a post office and the associated POA, and a third cluster resource for WebAccess, each cluster resource could be configured to fail over separately to different secondary nodes. This allows you to investigate each service separately and move them back to the original server when they are ready, without interrupting that service or causing significant load on other systems in the cluster. Figures 9 on p. 67 illustrates a simple view into a fan-out failover configuration.

Cluster Your Shared Volumes

Cluster-enabling the shared volumes where domains and post offices reside greatly simplifies GroupWise administration by making the management applications, software distribution directory and management snap-ins always available. If critical volumes are not cluster-enabled and one of the servers goes down, the administrator loses access to all of the management utilities and snap-ins that are on that server. Cluster-enabling the volumes makes the utilities, directories and snap-ins independent of any given server.

The advantages of cluster-enabling GroupWise volumes include:

  • Drive mappings always occur through the virtual server associated with the cluster-enabled volume, rather than through a physical server. This guarantees that you can always map a drive to the domain or post office database regardless of where the node is located.

  • The GroupWise snap-ins to ConsoleOne will always work no matter which node is running ConsoleOne and the snap-ins will be able to locate the configuration files necessary for management of the GroupWise system.

  • If you need to rebuild a domain database or a post office database, you won't need to determine on which node the database is currently located.

  • Help desk personnel do not need to know where GroupWise is running before they connect to a domain to create a new GroupWise user.

Conclusion

E-mail is the face of your company. When it is down you just don't look good. By clustering GroupWise, or at least key GroupWise services, you can ensure that e-mail and collaboration are reliable and that your internal and external communication is always operational. Novell Open Enterprise Server includes clustering services that support GroupWise (on NetWare or Linux) to help you provide high availability collaboration services, increase productivity and reduce costs and helpdesk incidents. You can cluster GroupWise in a variety of configurations to meet the needs of your organization.

Novell Application Cluster Modes

The following list includes some applications from Novell, and the cluster modes they support.


Application
Active/Passive Mode Support
Active/Active Mode Support

CIFS

X

X

NFS

X

X

FTP

X

X

AFP

X

X

LDAP

X

X

GroupWise

X

X*

NetStorage

X

MySQL

X

Apache

X

iFolder

X

iPrint

X

Virtual Office

X

DHCP

X

DNS

X

*GroupWise Active/Active mode requires components to be run in protected memory.

* Originally published in Novell Connection Magazine


Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.

© Copyright Micro Focus or one of its affiliates