Creating an Integrated Desktop Environment with NDS at CERN: A Case Study
Articles and Tips: article
DreamLAN Network Consulting, Ltd.
IT Division, Technical Manager
European Laboratory for Particle Physics
AS Division, Novell & Microsoft Technologies
European Laboratory for Particle Physics
IT Division, Novell & Microsoft Technologies
European Laboratory for Particle Systems
01 May 1999
This AppNote tells how CERN, one of the top particle physics research facilities in the world, has effectively leveraged Novell Directory Services to provide an integrated computing environment for over 3000 users.
Novell Directory Services (NDS) provides comprehensive directory services for the smallest to the very largest of networks. The benefits of NDS go far beyond single logins for your users and centralized administration. For example, you can use NDS to coordinate the distribution of and access to all your network resources, such as applications and printers. However, in order to harness these powers, you must have a correctly designed NDS tree—one that can easily grow with your future needs.
This AppNote describes how the Desktop Infrastructure Services (DIS) team at the European Laboratory for Particle Physics (CERN) leveraged the power of NDS in implementing its NICE (Network Integrated Computing Environment) project. Using NDS and a set of internally developed tools, the team was able to create a desktop environment that enables users to access all the available information resources at CERN from a single entry point. NICE implements NDS in a "zero effort" manner for over 3000 PC and Macintosh users.
After providing some brief background information on CERN, this AppNote covers:
The desktop support environment at CERN
The evolution of CERN's NDS tree design
Issues encountered with the initial tree design and their resolutions
Modifications to be made for future growth and enhanced performance
This AppNote retains the British spelling used by the authors.
For more information about the NICE project at CERN, visit:
Background Information on CERN
The principal goal at the European Laboratory for Particle Physics Research, or CERN as it is more commonly known, is to study the structure of matter. In total, some 6500 scientists (half of the world's particle physicists) come to CERN to conduct their research. They represent 500 universities and over 80 nationalities. Notable resources at CERN include the world's largest magnet (weighing more than the Eiffel tower), particle detectors the size of four-storey houses, and several particle accelerators, the biggest being 27 kilometres in circumference, in which particles travelling near the speed of light do over 11,000 laps each second. More recently, CERN has become known as the place where the World Wide Web was invented.
Since 1992, CERN has maintained a support infrastructure for PC-based desktop systems in order to create a single environment for use by all types of CERN staff—from secretarial personnel to engineers and physicists. This infrastructure has evolved with the generations of the Microsoft Windows operating system, from Windows 3.1 through Windows 95 and Windows NT 4.0. With each evolution, a consistent architecture has been maintained to provide integrated network services, maximising the transfer and sharing of information and thereby creating an ideal environment for knowledge sharing.
The architecture of this support infrastructure is aligned with the organisational need for providing high levels of support to the increasing numbers of CERN users, while keeping costs under control. Managing change is a fundamental concept in the architecture, and asset management techniques allow the effects of introducing and changing software packages to be predicted.
The Desktop Support Environment at CERN
Today, some 30 NetWare 4.11 servers provide the principal network server support for this infrastructure. The internal project name for this project is NICE, which now stands for Network Integrated Computing Environment. However, during the pre-NetWare 4 developmental stages of the project, the acronym stood for "Novell Integration Co-ordination and Evolution", something that was clearly required when running a number of NetWare 3.x servers for enterprise file services.
Currently, CERN has over 3000 personal computers which are supported by the NetWare server environment. This figure is expected to grow over the next few years as the number of on-site people increases and as users move from other desktop systems to Windows.
To reduce the administrative burden of supporting such an environment, the software configuration of all PCs is standardised and centrally maintained. The hardware configuration of every PC is known and is kept in an asset management inventory that is dynamically updated. All users and all PCs are equally managed. This implies that the same application software is available to all users and that all software is available to all PCs, always loaded from the NetWare servers. The service is managed to satisfy user software availability and stability requirements, as well as to reduce the requirement for end-users to install software locally onto their hard disk.
Upon logging in, a user is connected to a number of NetWare servers. The NetWare infrastructure is divided into specific functionalities to avoid the problems associated with running multiple types of service on the same machine. In particular, the major functions are:
NetWare Application Servers. These servers contain the applications and the Windows environments. There are a number of identical machines. Updates are propagated between these servers on a daily basis or more frequently if necessary. They are "read only" from the users' point of view.
NetWare Home Directory Servers. These servers contain the users' files in read/write directories.
NetWare Divisional Servers. These servers contain group files which are shared by members of the same department.
NetWare Mail Servers. These servers contain the mail files used by the legacy Microsoft Mail System. As CERN is moving to a centrally maintained IMAP-4 mail server, it is anticipated that these will disappear over the coming year.
NetWare Print Servers. CERN has more than 1000 printers of many different types. All users connect to one or more print servers, depending on which printers they normally use. CERN is moving to a pool of UNIX print servers with all desktops using an LPR (Line PRinter) client to print.
Standard Desktop Computer Installation
The standard installation of every PC is achieved by booting the PC with the "NICE95 " NT Installation" diskette inserted. The installation proceeds automatically and takes about 30 minutes for Windows 95 and 60 minutes for Windows"NT. The installer must supply only the Ethernet card type, the IP address of the computer, and the operating systems to be installed (Windows"95, Windows"NT workstation, Windows"NT server).
The NICE boot diskette loads the IPX network drivers and performs an automated login to one of the NICE application servers. From there, the rest of the installation is completed across the network. One advantage of IPX in this context is that it works homogeneously across a large routed IP network.
Another advantage of this automatic, standard installation is that only one floppy diskette is needed to install Windows from the network. It also allows support technicians to solve all hardware and software related problems in the local computer by being able to bring the PC into a known working state by reinstalling Windows.
NetWare Services for the Users
The NICE environment serves more than 6000 user accounts, with an average of over 3000 simultaneous users. End-users simply use the computer on their desk and do not need to even understand what a service is. The fact that users can perform a wide variety of different tasks using their computer means that the NetWare services are tightly integrated on the desktop, despite the fact that these services may be provided by different groups or divisions in the laboratory.
Home Directories and Divisional Volumes. Every user has a Home Directory that he or she can use to store personal documents. Users can change the access rights to their home directories as they wish. All security features are available and rights can be granted to every single file (who, from where, when, what) using the standard NetWare mechanisms. However, a simple application has been provided to make setting rights more user-friendly.
Every division, workgroup, or project can have a network volume where specific data not linked to a particular person but related to an activity (such as databases, archives, drawings, logbooks, and so on) can be saved.
All data saved in home directories and in divisional volumes is easily shareable with other users on multiple platforms (DOS, Windows 3.1, Windows 95, Windows NT, Macintosh, the World Wide Web, and UNIX via NFS) using multiple NetWare name spaces. These facilities provide compelling reasons to use the network disks (which are backed up daily) instead of the local hard disk (which is generally not backed up). In addition, the local disk is generally not available from the network because peer-to-peer networking is currently discouraged, given the large number of nodes that would be advertised as servers without providing the necessary backup.
Program Disks. The variety of application software available to the Windows desktop on NICE is very large in order to be able to satisfy the superset of all users' requirements. However, for each user requirement only a few solutions (often only one) are provided to encourage all users with similar requirements to use the same software tools. This is to facilitate the exchange of compatible information in the laboratory and to encourage user-to-user support, thereby gaining some economies of scale with user support.
Solutions are available to all the users in the following fields:
Word Processing and Desktop Publishing
Drawing and Drafting (with clipart image libraries)
Computer Aided Design (Mechanical, Electrical, and Electronic)
Computer Aided Engineering
Symbolic and Numerical Analysis
Controls and Tests
Programming in C, C++, Basic, Java, Nodal, and FORTRAN
Database access (remote and local)
Communication and Internet access (Telnet, FTP, Mail, X terminal, WWW clients, and so on)
The application software is always pre-installed on the server, and it is therefore the same on all PCs. Application software generally comes from either of the central NetWare servers on demand. Well-defined processes are in place to make software available site-wide by installing it on the central servers.
Users access all the standard applications software using the Windows Start menu, which is the same on all computers. Users can hide parts of the menu they do not need, but they cannot modify the menu content that is centrally maintained on all PCs. If desired, users can install software on their local disks under their own responsibility, but without any central support, although they can ask for it to be regularly updated.
The standard software is always installed on a reference Program disk. To ensure correct scalability of the system (able to support more than 3000 PCs simultaneously connected), the Program disk is replicated several times, with a consistent replication process that ensures that all copies are identical. An in-house application is used to perform incremental updates from the reference server to the other servers (see Figure 1).
Figure 1: The reference program disk is replicated on other servers.
Although normally the replication is done automatically every evening, a manual replication can be initiated when the administrator needs to perform an important or periodic update.
Every PC at CERN attaches to a copy of the Program disk which contains all the software the PC needs. The redundant availability of these copies ensures that every PC is always able to locate at least one Program disk on the network, even when one server is down or powered off. A simple algorithm is used to load-balance the connections to the central servers.
Print Services. Every computer has access to the more than 1000 printers at the site. The access is database driven and allows the user to search for a particular printer by location, building, or capability (PostScript or PCL, colour or black and white, printer or plotter, A0/A1/A2/A3/A4, and so on).
Communication Services. All PCs are directly connected on the Internet. They have access to the World Wide Web and can telnet or ftp to any host worldwide. Every user can send and receive e-mail and can fax documents from the e-mail client.
All home directories and divisional volumes files are accessible from the Internet remotely from the Web or using FTP; additional security to these services is provided by a firewall. The host name http://nicewww.cern.ch is a gateway that provides transparent access to all data files without the necessity to hard-code in the URL, the physical server name, the volume name, or the directory name. This leaves complete freedom to the server administrators to move, copy, rename, or split NetWare volumes without breaking URLs that are hard-coded in the HTML documents. Using this facility, each user can easily have his or her own home page on the Web. For a description of the WWW gateway service, see http://nicewww.cern.ch/doc/usere/uswwwe/uswwwe.htm.
Evolution of the NDS Tree at CERN
The initial design for the structure of CERN's NDS tree was devised in 1994-95. For its design, CERN had to consider the following guidelines:
NDS should be independent of the organizational structure at CERN. This was to reduce the administration of the NDS as much as possible. Mobility within CERN is quite high and the tree designers wanted to avoid having to move objects in NDS each time a user moves to a different group or division, or whenever changes in the organizational structure occurs.
Users should be able to share their files across the laboratories.
Users should not be exposed to a complexity of NDS tree. They should not have to remember in which context their user object resides, or even be required to know what a "context" is at all.
Decisions Made for the Initial Tree Design
The following sections describe various design decisions that were made for the initial NDS tree at CERN.
Home Directory Server Objects. For the server names, it was decided to use generic names such as SRVx_HOME, and place each server into a separate container named .HOME_x.SYSTEM.CERN. For easier administration of Volume objects, aliases for all volumes were created and placed in the Disks_and_Volumes.SERVICES.CERN container.
Group Objects. As people are involved in different projects and groups that share files across the whole laboratory, it was decided to put all groups together in the .GROUPS.CERN container.
User Objects. As some growth was anticipated in the number of User objects, they were put in eight containers under the USR.SYSTEM.CERN container, with the idea in mind that later on all these eight containers can be partitioned off. The eight containers are the following:
These containers hold User objects that start with a letter contained in the name. For example, a User object "jsmith" would be placed in container BJRZ. This was done to distribute User objects more or less evenly in a number of containers, at the same time allowing User objects to be created automatically by a program which can decide where to put a user object just by looking at the name of the user.
Because all SRVx_HOME servers needed bindery access for the Macintosh clients, each of the home servers has a USR replica. (The MacIPX client was not used mainly because there was not sufficient manpower to deal with its installation and reconfiguration. Refer to "Macintosh Support Issues" later in this case study for more details.) It was here that CERN ran up against the Novell-recommended limit on the maximum number of replicas per partition. This partition of 5700 objects was eventually replicated nine times because the number of servers needing Macintosh access has been growing.
At the time of this design, contextless login was not available from Novell. To simplify the login process, user aliases were created for all users and placed in one container: USERS.CERN. The context on all PCs is set to USERS.CERN. That way, a user can simply specify his or her login name without having to worry about where the User object actually is. At this time it was hoped that sometime in the future, Novell would develop a method of simplifying login for users. This can now be accomplished using NetWare 5's Catalog Services and the new Novell NT client software.
Application Server Objects. The application server objects were named SRVx_NICE and were placed in the NICE.SYSTEM.CERN container.
Print Servers and Print Queues Objects. Print servers (named SRVx_PRINT) and their associated Print Queue and Printer objects were placed in PSRVx.SYSTEM.CERN containers. To have better administrative access to these print queues without having to know on which server a particular print queue is residing, print queue aliases were created in containers named .n.PRINTERS.SERVICES.CERN. For example, if a print queue name starts with 1 , its alias is in .1.PRINTERS.SERVICES.CERN, and so on.
Issues with the Initial Tree Design
When CERN implemented their initial NDS tree a few years ago, they were well within the Novell-recommended design guidelines for NetWare 4.01 (the version they were using at the time). Since then, the NICE user base has increased at the rate of about 1000 users per year, as shown in Figure 2. In two years (by 1997), the CERN tree had outgrown the original design parameters and had run up against some scalability and performance issues (as a result of the tree design, not because of any limitations in NDS). These issues are discussed in the following sections.
Figure 2: Recommended number of objects per partition versus NICE NDS.
Scalability Issues. CERN's original NDS tree design served its intended purpose for a number of years when there were only 3000-4000 NICE users. But as the number of users increased to more than 5000, and as the number of home servers grew due to the ever-increasing need for more disk space, it became clear that the existing tree implementation did not scale as well as it could.
Because of the need for Bindery Services to support the bindery-based Macintosh clients, a replica of USR was needed on all servers that the Mac users would access. With the way user home directories and departmental volumes were distributed, a USR replica was required on all home servers. It was therefore a major undertaking when a new home server had to be added or a home server had to be upgraded. For example, to place a copy of the USR partition on a new server for the purpose of across-the-wire hardware upgrade, it could take up to 48 hours for the new replica to come online.
Furthermore, because of the size of the USR replica, "TTS disabled due to transaction growing" errors were encountered whenever a large burst of transactions occurred. For example, when upgrading DS.NLM or adding a replica of USR to a server, the server's Transaction Tracking System (TTS) could not handle the amount of transactions and would shut down. This could have been due to lack of enough memory on a server, or to the massive amount of NDS information being tracked at a given time.
Several solutions exist for this issue. One is to add more memory to the server; another is to limit the number of transactions tracked at a given time. The default number of transactions is 10,000, which is also the maximum. You can change this number using SERVMAN.NLM, MONITOR.NLM (on NetWare 4.11), or from the command line at the server using the following SET command, which tells the system to track fewer transactions and thus use less memory:
SET MAXIMUM TRANSACTIONS = 5000(or lower)
Lastly, it was observed that when a ninth server was added to the USR replica ring, all the servers in the USR replica ring exhibited a somewhat high CPU utilization (averaging about 35%). This was due mainly to DS synchronization. These servers started to report "Cache memory allocator exceeded the minimum cache buffer limit" and "Short term memory allocator is out of memory" errors, even when cache buffers were at 70%.
One solution for this cache memory allocator issue is to add more RAM. However, this is a short-term solution. The long-term fix is to reduce the size of the partition or reduce the size of the replica ring.
Performance Issues. It is interesting to note that even with the large partition size (more than 5000 objects), NDS performance was quite acceptable, even with some servers having low-end Pentium processors. For example, using a Windows 95 workstation (486, 32 MB of RAM, running Novell Client 32 v2.20) on a 10 Mbps Ethernet network, going through a Cisco router and then through a DEC Gigaswitch to the NetWare 4.11 server at 100 Mbps, it took less than 30 seconds to bring up the list of over 5000 objects in NWADMN95. A typical NICE/ Windows 95 user can log in to the network, using the User Alias object, with all the drive mappings and printer capturing done, in about 20 seconds.
One area where slow performance was noticed was when a user was added to a group. The association could take up to five minutes to register in NDS because of the need to create backlinks and external references. Recall that when an object (such as a User or Group) is referenced on a server (for example, when a file system trustee assignment is made) and the server does not hold a replica that contains the object, an external reference to that object is created on the server. At the same time, a pointer called a backlink is created to associate the real object with its external reference. This backlink is an NDS attribute and is stored on the server(s) holding a replica containing the object.
Due to the replication strategy employed by CERN (refer to Figure 3), NDS needed to create and maintain many backlinks and external references to facilitate tree-walking, especially for the objects in the GROUPS partition. The large number of backlinks and external references can create overhead in the NDS and prolong a DSREPAIR operation when all the external references and backlinks have to be verified for validity.
The SRV1_HOME server showed a much higher CPU utilisation than the other home servers did. This high utilisation was due to the use of Bindery Services to support the 300 Macintosh workstations running bindery-based client software. The cause of this utilisation spike is generally overlooked because it is not commonly known that the NetWare 4 OS creates only one service process thread to handle all of the bindery connections.
Another server performance issue CERN is still experiencing is one faced by many other NetWare 4 sites worldwide—the long time it takes to mount large volumes, especially ones with multiple name spaces. Fortunately, with the recent release of Novell Storage Services (NSS), this problem will become a non-issue. For more information about NSS, see —Novell Storage Services (NSS): Pushing IntraNetWare to New Heights— in the September 1997 issue of Novell AppNotes.
A New Look: The Revised NICE Tree
Figure 3 shows the current NDS tree implementation at CERN, which still bears a strong resemblance to its original design circa 1994-95. CERN's NDS tree now consists of roughly 6,000 User objects, 6,000 User Alias objects, and several hundred Group, Printer, Print Server, Print Queue, and other NDS objects, for a grand total of about 15,000 NDS objects.
Figure 3: Dial-access network using the RADIUS protocol.
The current tree is divided into 19 partitions. The partitioning is done in such a way that the maximum number of children for any parent partition is 14. Each of the two largest partitions (USR and USERS) has about 6,000 objects (containing User and User Alias objects, respectively). The [Root] partition holds approximately 600 objects (mostly Printer Alias and Volume Alias objects).
The table in Figure 3 shows a Partition and Replica Matrix for the CERN tree. The key is as follows:
M = Master
RW = Read-Write
SR = Subordinate Reference
Figure 3: Partion and replica matrix for the CERN tree.
Using Updated Design Recommendations
Most of the NDS issues encountered by CERN discussed above are results of the tree design following out-dated Novell design parameters and not necessarily of NDS itself. Shortly after the introduction of NetWare 4.11, and after several years of thorough hands-on implementation, troubleshooting, and analyzing various NDS tree implementations, Novell revised its design recommendations for NDS, as summarized in the following table.
NDS Quick Design Rules(suitable for 90% of installations, regardless of size)
Partition size: 1,000-1,500 objectsNumber of child partitions per parent: 10-15 partitionsNumber of replicas per partition: 2-5 replicas (typically 3)Number of replicas per server: 7-10 replicasNumber of replicas per replica server: 30 replicas (see note below)Minimum server hardware: Pentium 100+MHz, 64 MB of RAM
NDS Advanced Design Rules (for special cases)
Partition size: 3,500 objectsNumber of child partitions per parent: 35-40 partitionsNumber of replicas per partition: 10 replicas (typically 3)Number of replicas per server: 20 replicasNumber of replicas per replica server: 70-80 replicas (see note)Minimum server hardware: Pentium Pro 200+MHz, 128 MB of RAM
Note: A replica server is a dedicated NetWare server that stores only NDS replicas. This type of server is sometimes referred to as a DSMASTER server. This configuration is useful if you have a lot of single server remote offices. The replica server provides a place for you to store additional replicas for the partition of a remote office location. Some sites also use replica servers to facilitate central partitioning control and management.
The number of replicas on any server depends on how long it takes for the process on the server to complete: A good rule of thumb is that it must be able to complete in 30 minutes or less. The factors that affect the time it takes to complete the synchronisation are:
- CPU speed of the replica server- Number of replicas- Number of objects in each replica- Size of each replica ring- Location of replicas in the replica ring (local or remote)- Speed of the WAN links connecting remote replicas which includes bandwidth and round trip time (long satellite latency, for example)- RAM available on the replica server- Frequency of inbound replica synchronisation
Turning Over a New Leaf: Future Design Modifications
After a few days of on-site consultation at CERN to examine the immediate and future needs of the NICE users, DreamLAN Network Consulting Ltd. developed a new tree design and partition strategy for the CERN NDS tree. This new design is shown in Figure 5.
Figure 5: A high-level view of the new CERN NDS tree and it's partition strategy.
As part of the new design, there will be three "authentication servers" which will be dedicated for user authentication. At the same time, these servers will host the Master replicas of the various partitions, therefore doubling as DSMASTER servers.
The users are grouped by their home server instead of username as done previously. This allows each home server to have a replica of the partition, thereby eliminating the creation and maintenance of backlinks and external references. At the same time, this makes tree expansion much easier. Should there be a need for a new home server, they can simply create a new OU for the users, partition this OU, and place a replica on the new home server.
Although a couple of the home containers will still have about 1,500 User objects, this does not pose a performance problem. The CERN network backbone, servicing over 10,000 nodes over an area approximately 300 sq-km, consists of over 30 FDDI rings connected through a DEC Gigaswitch. Therefore, bandwidth is not an issue—there are no —slow— WAN links involved.
The number of Group objects will be much reduced, as there are many duplicate and obsolete groups. As a result, a GROUP replica will not be placed on every server, as the number of backlinks and external references will not be significantly high.
Ideally, we would like to do away with the over 5,000 User Aliases in the USER partition. However, that requires one of two things. CERN would either have to instill the concept of "context" into its NICE user base so a user can log in to the network using any workstation at CERN, or they would have to implement a contextless login solution.
There are currently a number of contextless login solutions available. However, CERN has made the decision to use the Novell implementation, which requires Catalogue Services. In addition to contextless login, CERN will make use of Catalogue Services' search capability for other purposes, such as generating reports on various NDS objects.
Macintosh Support Issues
The solution to the high CPU utilisation due to Macintosh clients is not very straightforward. The obvious solution is to upgrade the bindery-based client software to an NDS-aware version. However, it is not practical to upgrade all 300 of their Macintosh workstations to the new client. First of all, the NDS-aware client requires Mac OS System 7 or higher. Not all the Macintoshes at CERN can support that. Second, because of the diverse Macintosh hardware, it is impossible to auto-upgrade the client software. That means over 300 upgrades would need to be done manually, involving a lot of manual labour and time.
As an immediate solution, the Macintosh users will be moved to a number of home servers dedicated to them. The PC users can still have access to the departmental volumes on these "Mac home servers", but Macintosh users will not be able to access files on other home servers unless a replica containing the Mac User objects is placed on these other home servers.
It is anticipated that, over the next couple of years, there will be a natural decline in the use of Macintosh workstations, and therefore supporting them will become less of an issue.
This AppNote has described NICE project undertaken at CERN to provide a desktop environment that would allow users to easily access the network resources they need from a single entry point. By using NDS and various internally developed tools, CERN was able to implement this environment in a "zero effort" manner for over 3000 PC and Macintosh users.
* Originally published in Novell AppNotes
The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.