Should I Use a Directory, a Database, or Both?

Articles and Tips: article

Vikas Mahajan
LDAP Directory Specialist
PriceWaterhouseCoopers
vikas.mahajan@us.pwcglobal.com

01 Nov 2001

The question of whether to use a Directory or a Relational Database stirs up much controversy and can bring a directory project to a standstill while directory engineers and Database Administrators argue the benefits of their respective technologies. This AppNote tackles some of the major issues surrounding this controversy and demonstrates that the two technologies are actually complementary solutions.

This information is adapted from an article published on the LDAPzone Web site at http://www.ldapzone.com. This is a great source for learning how to profit from developing solutions and services with LDAP, as well as XML (eXtensible Markup Language), DSML (Directory Services Markup Language), and other network industry standards.

Introduction
Characteristics of an LDAP-Compliant Directory
Characteristics of an RDBMS
How Directories Are Used
Determining Directory or Database
Conclusion

Topics	databases, directories, network application development
Products	Novell eDirectory
Audience	network consultants, integrators, programmers
Level	beginner
Prerequisite Skills	familiarity with directories and relational databases
Operating System	n/a
Tools	none
Sample Code	no

Introduction

Few questions stir up as much controversy and confusion as that which asks, "Should I use a Directory or a Relational Database?" This single question can bring a directory project to its knees as directory engineers and DBAs hammer out the benefits of their technology while quickly showing scepticism and doubt about their counterparts' gaudy visions for their respective systems.

But the battle doesn't end there . . . . If either system is poorly implemented, the performance of the system suffers, resulting in blame and abandonment of the technology which, perhaps, was simply tasked with doing something it wasn't designed to handle.

In this article, I hope to tackle some of the major issues surrounding this controversy. Furthermore, I hope to demonstrate that Directories and Relational Databases are complementary solutions. By gaining an understanding of each system's suitability to task, you can analyze your data requirements and more easily determine the types of data each system should hold. The heated discussions with your DBA are still likely to take place, but now you can enter those meetings with the knowledge you need to stand your ground regarding the business and data needs your directory will satisfy. And, with a little work on your part, your DBA will soon find that you are a colleague working to solve the same business needs rather than a radical technologist looking to replace his or her database with the latest technology "buzzword."

Characteristics of an LDAP-Compliant Directory

The Directory is perhaps the most misunderstood piece of network infrastructure in your enterprise. From application protocol to file system, many people have misconceptions regarding the function of the directory. So let's clear the air and establish some rather clear and generally accepted statements regarding directories:

Extremely fast Read operations. Directories are tuned for higher read performance because the nature of the data in the directory is more commonly read than written or updated.
Relatively static data. The data most commonly stored in the directory is not frequently subjected to change or modification.
Distributed. The directory, and henceforth the data it stores, is distributed in nature.
Hierarchical. The directory is capable of storing objects in a hierarchical fashion for organization and relationship.
Object-oriented. The directory represents elements and objects. Objects are created from object classes, which represent a collection of attributes.
Standard schema. Directories utilize a standard schema that is available to all applications making use of the directory.
Multi-valued attributes. Directory attributes can be single or multi-valued.
Multi-master replication. Most leading directories offer multi-master replication, allowing writes and updates to occur on multiple servers. Therefore, even if servers are unable to communicate for periods of time, operations can still occur locally and then be sent to other replicas once communication is restored.

Additionally, if the Directory is LDAP-compliant, you can be assured that the directory will interpret and respond to LDAP queries and requests from any LDAP-enabled application. LDAP is a protocol supported and maintained by the Internet Engineering Task Force. The LDAP standard also encompasses schema definitions, the LDIF file exchange format, and definitions for some object classes. Work is also being done to develop standard replication mechanisms and other directory-related operations.

Characteristics of an RDBMS

The Relational Database also possesses a set of characteristics relatively common across all Relation Database Management Systems (RDBMS), including:

Write-intensive operations. The RDBMS is frequently written to and is often used in transaction-oriented applications.
Data in flux or historical data. The RDBMS is designed to handle frequently changing data. Alternatively, a RDBMS can also store vast amounts of historical data which can later be anaylzed or "mined."
Application-specific schema. The RDBMS is configured on a per-application basis and a unique schema exists to support each application.
Complex data models. The relational nature of the RDBMS makes it suitable for handling sophisticated, complex data models that require many tables, foreign key values, complex join operations, and so on.
Data integrity. The RDBMS features many components designed to ensure data integrity. This includes rollback operations, referential integrity, and transaction-oriented operations.
ACID (Atomic, Consistent, Isolation, Durable) transactions. The transaction either commits (such that all actions are completed) or it aborts (all actions are reversed or not performed).
- Atomic. Atomic transactions consist of grouping of changes to tables or rows such that all or none of the changes take place. A rollback operation can reverse all the actions of the atomic transaction.
- Consistent. Transactions operate on a consistent view of the data. When the transaction is completed, the data is left in a consistent state.
- Isolation. Transactions run isolated from other transactions. So if transactions are running concurrently, the effects of transaction A are invisible to transaction B, and vice-versa, until the transaction is completed.
- Durable. Upon commitment of the transaction, its changes are guaranteed. Until the transaction commits, none of its actions are durable or persistent. If the system crashes prior to a commit, the effects of the transaction will be rolled back.

Note that LDAP directories can be deployed as applications on top of relational databases. The LDAP specification does not define the underlying data store for the directory, so each vendor is left to choose their underlying database. Oracle and IBM offer Directory Servers running on top of their RDBMS systems. Is this the best of both worlds? Well, not quite. There are many challenges, possible limitations, and performance implications that may be associated with such a product. Mapping the LDAP hierarchy, supporting extensible schema, translating queries to SQL, and supporting multi-valued attributes are just some of the challenges faced by these solutions.

For a detailed overview of how IBM designed their solution, I suggest visiting http://www.research.ibm.com/journal/sj/392/shi.html. However, the fact that these RDBMS vendors have embraced LDAP and directory technology demonstrates that the directory does indeed have a unique position in the enterprise, working alongside the RDBMS to solve business problems.

How Directories Are Used

So far, this overview has told you what each system is good at, but it still doesn't help you determine for what situations or applications each is best suited. So let's take a look at how LDAP-compliant directories are deployed in the enterprise, working alongside the RDBMS to solve business problems. In the Internet space, we find directories serving three major roles: Authentication, Authorization, and Personalization.

The Directory as an Authentication Source

The Directory as an authentication source makes great sense since most directories were designed with security in mind. Extremely granular, robust security mechanisms are available, allowing security down to the attribute level. Many Directory-enabled applications are available to extend directory security mechanisms to include a wide variety of authentication mechanisms, including PKI, biometrics, tokens, and other advanced forms of authentication.

Many portal deployments include a security framework that leverages a directory. Products such as Netegrity's SiteMinder or Oblix's Netpoint take advantage of the directory's security framework to allow multiple grades of authentication as well as distributed management and application authorization.

Databases are not, by default, as flexible with security. Most offer column-level security, but the advanced security solutions available with directories are far more flexible and granular.

The Directory as an Authorization Source

Permissions and restrictions to resources are also well suited to directories. Many security and identity management applications rely on directories to authorize users to resources or Web-based applications. Products such as SiteMinder, Netpoint, and iChain incorporate policy-based authorization as a major component of their security architectures. Once security policies are established and configured, they do not change frequently. However, the frequency at which they are read remains high because the policies are constantly being checked to ensure security is properly enforced.

The Directory for Personalization

From e-mail systems to policy-based security, more and more applications are finding the directory to be an ideal repository for storing data related to the user. Since the directory maintains a record for each user, it makes sense to store user-specific data with the user. For example, names, addresses, telephone numbers, and e-mail addresses are the most common type of personalized data found in a directory. This data is specific to the user, but often needed by many applications.

This personalization can go much deeper to include many types of preferences (often referred to as profiling). For example, Netscape's "My Netscape" portal is powered by the iPlanet Directory Server and hosts a plethora of personalization characteristics for users, allowing them to individually customize the look, feel, and content of the portal to suit their tastes.

Among the benefits of using a directory are the hierarchical design, multi-value attribute support, extensible schema, and standard LDAP support and APIs. The need for speed and scalability also play well to the strength of directories. A great example is CNN interactive, powered by Novell's eDirectory. Originally designed on top of an RDBMS, CNN found they couldn't achieve the performance they needed using a relational database. With a directory, however, CNN was able to achieve under 250 millisecond performance, with the directory responding to more than 2000 requests per second and upwards of 25 million directory lookups per day. The result was a Web site with the ability to instantly personalize content, even under the most stressful situations.

Determining Directory or Database

OK, so now you have a fairly good idea of where a directory really makes sense in your enterprise and how your applications can make use of it. But you may still be left wondering how you determine if your application should store data in a Directory or RDBMS. Well, first of all, keep in mind that it is not an all-or- nothing situation. Actually, the two solutions are quite complementary.

Portal products from vendors such as Epicentric, Plumtree, and Tibco are great examples of directories and databases working together. These products can be configured to rely on directories for authentication and maintenance of users and groups. However, these applications still utilize the RDBMS to store application- specific settings and entitlements, as well as data not suited for directories such as web pages, Java applets, BLOB data, and other information the portal needs to function.

Now, take a closer look at your data and your applications. Ask yourself several questions:

Is the data dynamic or relatively static?
Does the data need to be distributed?
Can the data be used by more than one application?
Is the data multi-valued?
Can your data or application take advantage of a hierarchical relationship?
Do you need flexible security options?
Do you need single sign-on?
Do you need distributed or delegated administration capabilities?

If you can answer yes to some or all of these questions, then directories and directory-based applications would likely be useful to your application or project.

As you continue your analysis, also ask yourself if the data and/or the applications require the benefits of ACID transactions. While most directories offer some form of data integrity, such as referential integrity, the directory was never designed for transactional data. If the data changes frequently, then it probably doesn't belong in the directory. For example, you may want to keep IDs or account numbers in a directory because the data is relatively static. But you wouldn't store account balances in a directory because the number would likely change often.

Furthermore, due to the loosely consistent nature of directories and the nature of multi-master replication, there may be times when a change made to a replica on Directory Server A will take some time to get replicated to Directory Server B. As such, an application reading from replica B could get a different result than an application reading from replica A if both are read immediately after the change is made to replica A. It is even possible that a different change can occur to the same data on replica B before the changes of replica A are made to replica B. The inconsistencies will, in time, correct themselves (conflict resolution can be handled via timestamps, changelog records, and so on, and vary from vendor to vendor), but for accounting or finance applications, this is not suitable. The data must be committed and consistent. As such, finance and accounting data is best suited for a RDBMS.

Remember that directory data should be relatively static such that its ratio or reads versus writes is relatively high. Since the data is unlikely to change, the probability of an event as mentioned above actually occuring is minimized. Furthermore, the nature of the data isn't such that it is dependant upon transactional level data integrity. Don't make the directory do something it was never designed to handle.

Conclusion

Many of these issues require a thorough understanding of the application and the data involved. Be sure to consider the business requirements driving the project or application development. You will find that the business requirements will dictate the technology, not vice-versa. Remember that directories and databases are complementary, not competitive, solutions. The more you understand the benefits and cost-savings you can gain from a directory, the better you can justify the implementation and migration to a directory-enabled network infrastructure.

* Originally published in Novell AppNotes

Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.