A Look into the Future: Distributed Services and Novell's Advanced File System

Articles and Tips: article

DREW MAJOR
Chief Scientist
Novell, Inc.

01 Dec 1995

This AppNote contains excerpts from keynote addresses which Drew Major delivered at Novell's tenth annual BrainShare conference in March of 1995. He discusses the future of distributed services and Novell's advanced file system, covering several issues of interest to network designers, integrators, and administrators.

Introduction
The Future of Distributed Services
Benefits of Distributed Services
Novell's Advanced File System
Conclusion

Introduction

During the tenth annual BrainShare conference held in March of 1995 at Salt Lake City, Utah, Drew Major, Novell's Chief Scientist, delivered two keynote addresses. Many things have changed since then, both in Novell's strategic direction and in the networking industry at large, but much of what Major talked about is still valid. This Application Note focuses on two main topics that are of potential interest to network designers, integrators, and administrators: distributed services and the advanced file system. These key services are central to the smart, global network infrastructure that Novell has envisioned for the future. You can read more about Novell's plans in the "Global Network Services" White Paper reprinted in this issue.

- Editor

The Future of Distributed Services

I'd like to discuss where we at Novell believe the industry is going to go, as well as what Novell is doing with regards to distributed services. I coined the term "distributed services" to talk about a particular concept, because the terms I had heard of before were inadequate - particularly "client-server." To me, client-server means connecting a server to a bunch of clients and splitting up the application workload between them. While that's very important, it's only one dimension.

There's another equally important dimension that we're barely starting on. That is how the servers themselves can communicate with each other, server to server, working together to present a common service. Over the long term, this is actually going to be more critical than just dividing applications between the client and server.

Client-server was tough to program. Server-to-server is going to be tough as well. It's a whole new dimension, bringing with it a whole new set of problems. Yet it's a critical element to continue moving our industry forward and improving the capabilities that we can deliver. We need to make servers work together more as a single service. They've got to appear as a common team.

In conjunction with this, there needs to be a change from a physical, server-centric view of the network to a logical view of the services. Users need to think of logging into the network and connecting to the service, not to the server.

New Capabilities

Let me explain some of the capabilities that Novell sees will be necessary in this new distributed services environment. First, the services themselves have to be written differently. They must be aware that there are other sibling services, other servers that they have to work closely with and synchronize with.

Again, this introduces a new dimension to developing applications. When you write a program that applies to a single box, you generally have direct control. Adding an external client piece is a little more complex. But dealing with another server that is equivalent and that you're sharing work between and synchronizing with, that's hard. There aren't too many applications written that way yet. But that's the direction in which the industry needs to go.

Another capability needs to be an enhanced form of fault tolerance. If servers are working together as a team and one of them goes down, the other servers need to be able to pick up the pieces and keep the service going. They need to clean up after each other, perhaps with the assistance of a client.

Other capabilities include being able to add servers and dynamically move resources and workload between servers. If you add servers to the service, the external view stays the same but the capacity increases. These are very important capabilities that this new world of distributed services can provide. But they don't come for free - they have to be programmed in. Application developers need to think about it and design the service accordingly.

Distributed Object Models

A number of groups are already thinking about distributed services. One area that we've been paying close attention to is the distributed object work among the proponents of CORBA (Common Object Request Broker Architecture). I should differentiate what I'm talking about with what's talked about there.

Distributed objects provides a high- level way of communicating between two entities. However, the object itself still has to be aware and has to be able to do everything it needs to do. Microsoft is talking about distributed objects now. They've been working with Digital Equipment and have done some things with DCE (Distributed Computing Environment), and they've just lately become involved with CORBA. But their focus is more along the lines of client-server.

What I'm talking about is more at the lower level, where the services themselves are synchronized. No one else is really focusing at the infrastructure level or on the demands of the server-to-server model. Novell is unique in that focus. But that's the way it will be implemented and that's where the value will be.

What We Need to Make It All Work

To make this happen, we must have a very efficient server-to-server communications channel. We need to do something similar to what we did with the NetWare Core Protocols in making sure that network file services were faster than the local hard disk. We have to have that same type of high efficiency, high performance channel. Otherwise, when servers are synchronizing with each other and talking to each other, the overhead will be a killer. That's one of the problems with RPC (remote procedure call) and the technology and applications of the past.

Novell already has well into development a very fast way of communicating server-to-server. We'll use different ways of marshalling [the process of passing function calls and parameters across process boundaries] and enabling that communication so in some sense it will look like either an IDL (interface definition language) or RPC type of communication. But underneath, the "plumbing" is going to be really fast. This is what Novell is talking about with DPP, or Distributed Parallel Processing - the capability of having multiple servers synchronized in a very efficient way.

We also need to have distributed services for a distributed naming system and connection management. A distributed naming system would be based on NetWare Directory Services (NDS). Connection management is needed to manage the whole connection and provide housekeeping and infrastructure support for the distributed service itself.

You also need transactional and synchronization support so that they can use either distributed lock manager. Oracle Parallel is one example of a distributed service that works today, and it uses a distributed lock manager for synchronization. The Tuxedo functionality for distributed transactions is also extremely important.

We're essentially building a set of services that are going to have these characteristics. At the same time, we're going to build the underlying infrastructure to support it. This infrastructure will be made available to developers who want to make their services distributed so they can take advantage of that.

Benefits of Distributed Services

Let me quickly summarize some of the benefits of this type of distributed services. First is fault tolerance. If a server goes down, potentially you wouldn't lose all the state information. Even if you do lose some state information, the servers are working together as a team and should be able to fill in for each other. So you also get more robust services. With this next generation of software, servers can move beyond being "islands" and start working together as they should.

You might ask, what happens to SFT III in this distributed world? SFT III's mirrored-server fault tolerance does synchronization at a much lower level, which is still valuable in some cases. Imagine an application that had to have pieces physically synchronizing with each other in different servers, at the application level. Depending on the particular implementation, that synchronization could involve such high overhead that it could make the thing run a lot slower. In some cases that logic would be better placed in an SFT III environment where synchronization happens at a lower level. Then it would actually run faster than if it were distributed.

In essence, you have two approaches to the same problem. Sometimes one is appropriate and sometimes the other is. SFT III is another way of providing fault tolerance by synchronizing full states of the applications themselves, at a lower level. Developers will be able to choose either the SFT III or distributed services approach, according to the needs of their applications.

Distributed services will also be more scalable. You'll be able to plug more servers in to increase capacity. Since you can use cheaper hardware to achieve scalability, it's more cost effective. And you'll be working with one service rather than with several isolated servers, so it will be more manageable as a single service. It's more flexible, as you can reconfigure services more easily and transparently. Security can be enhanced because there are more firewalls and "breaks" between systems. If one machine goes down, it doesn't necessarily take the other machines down with it.

Novell's Advanced File System

Novell is developing an advanced file system [NetWare Advanced File Services or NAFS] that is a good illustration of the power of distributed services. We're separating the current file system into two separate entities: the naming service and the storage service. The new naming system will be able to handle all types of objects, not just files and directories. It will be based on our NDS technology, but enhanced to meet the requirements of the file system directory. The storage system will be a separate service that controls the actual placement of data on the storage media.

We think this separation is important for several reasons. First, the distributed naming system will provide a single, logical, network-wide view of all names, files, and directories. We'll be able to enhance the ways of querying and searching for data. For example, you'll be able to search for data by content and attributes. Second, it will enable us to build an intelligent data storage system that can do a lot of powerful things in terms of how data is handled and where it gets moved. It will have some document management capabilities built in.

Today, the way you find files or objects on the network is to go to a server, and then to a volume and directory on that server. The new naming service would, upon installation, look at the entire network and create a "snapshot" physical view of what the network looks like at that time. It would then build its own logical view of the network - symbolic view that is equivalent to what it is physically there. From then on, all data accesses would go through the naming service prior to hitting the disks.

So now we've introduced a layer of abstraction between the naming service and the actual storage processes. This will give us a lot more flexibility in how we can look at and work with the data. Let's go through some scenarios to help illustrate some of the capabilities of this advanced file system.

Multiple Views of Data

The extra layer of abstraction between the naming service and the storage service will allow us to create new views of the data. We can take the data and reconstruct it in any way that is useful - create an alternative single directory tree, for example. At the same time, the old view is preserved. The physical data remains where it was; only our logical view of the data changes.

As far as applications are concerned, they still think they're accessing data in the traditional way: going to the physical server, opening the file, and so on. But it's actually going through the naming service, and that allows us to move the data around to where it really needs to be without having to reinstall the applications or change the login scripts.

Again, the logical or symbolic view is preserved even while things change underneath. Over time, perhaps the physical view will go away and people will start using the logical views exclusively.

Data Migration and Replication

One key aspect of intelligent data management is the ability to migrate and replicate data according to its use. If the file system has a single copy of a file at a particular place on the network and users are continually accessing it from another place, the system can detect that the data is going over the backbone and that it's being accessed a lot. The system can then figure out now to best migrate or replicate that file so that the consumers can access a closer copy.

As an example, suppose you have some word processing documents stored on a server in Utah. You travel to California and, while there, you need to work on those documents. When you connect to the network and request to start up the application, the network recognizes that it has equivalent application files in California and doesn't have to go all the way back to Utah for them. When you ask for your documents, the data files will be replicated on the server in California while you're working there. When you return to Utah, they will follow you back. You see the same view of the data wherever you happen to be.

This type of intelligent data migration and replication is very powerful. It's an excellent example of the power of the logical naming being built into Novell's advanced file system. And it's actually not that hard of an algorithm to work out. We just look at access patterns and use them to figure out better ways of replicating and migrating the data.

Fault Tolerance and Backup

Fault tolerance is a lot easier with the advanced file system. When you create files, for example, you can have the file system initially put them in two places. If one server goes down, it's not a problem. You've got another server with your files on it.

Over time, as the servers are backed up, the file system could recognize that the duplicated files have been backed up and automatically remove the extra copy from primary storage. The advanced file system actually consolidates backup and migration to the point where backup is simply an extension of migrating the data.

Easier Handling of Server Shutdown or Failure

There are several capabilities that can be best illustrated by talking about some shutdown or failure scenarios. Suppose you need to do some maintenance on a server. Today you have to have everyone close their files and log off the server. In this future environment, you can simply say to the server, "Depopulate yourself." The server will then start moving all the data that it has the only copy of onto other servers. When it's ready (it may take a while), the server will report, "Okay, you can shut me down now. My data is somewhere else and all the client connections have been transparently migrated." This is one scenario that will be very valuable.

If an active server fails, the other servers could realize the one server is dead, check with all the clients to determine what files were open, and reconcile everything. Or it could be a client-initiated process where the client says, "Hey, can someone help me here? I used to be taking to this server, and now it's dead. Do you have a copy of this file?" If the client is keeping all the state changes it has been making and the file changes have been happening on two servers, the client can resume working very quickly, with virtually no loss of data.

Another scenario is if a client hard disk fails. If you had your local hard disk replicated on the server, all you'd need to do is plug in another hard disk and the file system will naturally move the data back onto your client. It'll be no big deal.

Reduced Costs

The advanced file system will also reduce networking costs by allowing you to eliminate a lot of the duplicate data you have now. It's not unusual for today's applications to take up 30 or 40 megabytes of disk space when installed on a server. Maybe three or four megabytes of that is actually used, but you can't get rid of the rest because all the modules are tied together. What's worse, you typically have to install the application on each of your servers for performance reasons.

What if the file system could analyze the usage patterns and determine that some of the application files are rarely, if ever, accessed? It could then remove all but one physical copy of the unused files, migrating a copy here and there as needed. To users, the perception would be that the files are everywhere, but in reality there's only one copy somewhere. All the others are just migrated versions of that single copy.

We did an analysis of our servers within Novell and found that at least 25 percent of our data was duplicated in other packages and applications. So we can see some value - even cost savings - by getting rid of duplicate, redundant data.

Improved Performance

To enhance performance, we're adding the capability in the storage system to stripe data across servers. We've been able to stripe files across multiple hard disks for a long time now. But with the advanced file system, if a particular server doesn't have enough disk channel bandwidth to handle very high volume files, you can take parts of the files and stripe it across multiple servers. This would be especially useful for large files - such as video files, for example - that are being accessed and changed a lot.

Another performance enhancement, one that will probably be very much appreciated by everyone, is the mounting of volumes. It's going to be done within half a minute. And there will be a lot less need to run VREPAIR.

Better Integration of Clients with the Network

Thirteen years ago, when we first started networking, we made the network (server) look like a local hard disk. Of course, we added symbolic drives to do that. But what we ended up with is a "mixed" view of the network: you deal with local resources one way, and you deal with the network in other ways. We need to evolve this into a more integrated view where the local hard disk is seen as just another piece of the network.

Once benefit of integrating the client workstation with the network is the ability to replicate your local hard disk data onto a server for fault tolerance. Of course, there are those who keep sensitive data on their local hard disk just so no one else can see it. They don't want it on the server because then maybe the administrator or somebody else could access it. We'll preserve that level of security by providing the option of encrypting the data prior to it being moved to the server. So you'll still have the same security even if the data has been replicated onto the server from the local disk.

With the advanced file system, we'll also have the capability of using local hard disks as a local cache for holding a replica of network data that pertains to the user. This would do a lot to improve performance. Since the file system migrates data as a function of use, the data itself will get moved closer to its consumers. Local hard disk becoming part of the network cache - that's a very powerful concept. Many times users won't even have to ping the network anymore to obtain their data.

This alludes to another important capability, which is the ability to work while you are disconnected from the network. If your local machine is a replica and has a cache of your network data, you can preserve the same view when you're disconnected. It's transparent. When you're disconnected from some of the areas, you'll see the same view of the data even though some of the areas won't be accessible at the time. When you reconnect, things will auto-resynchronize. The NetWare Mobile Client is the first phase of this. It's the first capability we'll merge in when the full NetWare-wide capability is put in.

Conclusion

As I mentioned, the changeover to this new file system is going to be evolutionary. Users won't see a big change in the way they interact with the network. We learned a hard lesson from our bindery-to-NDS migration experience. We're not going to do that again in terms of compatibility. This is a single architecture that builds on what we already have.

These services are currently in the prototype stage, and they're looking really good. We're going to preserve the performance. Management will be simpler, and you'll have the extra fault tolerance. Plus, we're adding support for mobile and disconnected clients. There's a lot of power in getting away from the physical view and moving to the logical view, and then taking advantage of the flexibility that provides.

* Originally published in Novell AppNotes

Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.