Personalizing and Customizing Web Content Using NDS eDirectory at CNN
Articles and Tips: article
Systems Engineering Manager
Novell Southeast District
01 Nov 2000
This AppNote describes a project undertaken by CNN Interactive to provide customized news, sports information, and financial data to their users using Novell's NDS eDirectory.
The CNN Interactive Personalization Project
CNN Interactive provides worldwide news, sports information and financial data via three primary domain names: cnn.com, cnnsi.com and cnnfn.com. CNN Interactive serves an average of twenty five million html page views per day (more than nine billion page impressions during 1999). CNN Interactive has nearly 100 Web, application, and special-purpose servers.
Personalization Project Business Needs
CNN, like many Internet-based companies, desired to customize and personalize their advertising and content. CNN utilizes a cookie to identify "who" is accessing their web servers. Note that CNN uses a cookie as a unique identifier so that each visitor does not have to "log in" upon each visit to the site. The cookie is only a GUID (global unique identifier) that is looked up in a database and allows the Web servers to personalize content based on the returned preferences. Many Web sites handle this personalization by storing personal information in a cookie stored on each user's computer. With the CNN solution, only the GUID is stored in the cookie ... not personal data!
CNN began research on a suitable database and caching mechanism in mid- 1999. CNN explored writing an application-specific cache to "front end" a database used to store preferences. At the same time, Novell met with CNN regarding the possible use of NDS to satisfy the requirements with "out of the box" software.
The business needs CNN stipulated are as follows:
The ability to scale the architecture and handle load with a 10:1 or greater web server to directory server ratio
The flexibility to extend and build upon the architecture without "rip and replace" or re-engineering efforts on the database
The performance to provide uninterrupted quality of service during peak utilization (includes low latency requirements and time-sensitive determinism)
The reliability of a "hands off" deployment ("it just works")
The fault tolerance of value-added services must support 24x7 uptime and "route around" capabilities in the event of system downtime (graceful degradation)
An architecture that supports planned growth with automated intelligence to control resource utilization
Cross-platform support to include NetWare or Solaris as a directory services host
The desire to minimize risk by utilizing stress tested, Internet proven technology
Partnering with a company that would work closely with CNN to drive their current and future demands into the core product (entrepreneurial spirit)
The desire to use "off the shelf" shrink-wrapped software to minimize the ongoing customization and maintenance by CNN developers
As an Internet company, CNN required the use of open standards for accessing and manipulating the user profile data
Personalization Project Technical Requirements
CNN delivers page views to consumers with a quality of service characteristic of less than 1 second delivery time (regulated by consumer bandwidth). Considering the domain name resolution, network packet delivery latency, and ad retrieval for injection, the time CNN allocated for a directory lookup to personalize content is less than 250 milliseconds. With more than 100 Web servers, each with the capability of making directory requests, CNN calculated the need to support over 2000 requests per second. CNN expected greater than 99% hit rate of profiles served via NDS.
Why CNN Chose NDS eDirectory
CNN evaluated multiple options to solve their personalization problem. Novell's NDS eDirectory was designed precisely to meet the requirements CNN had imposed. Many of the unique features listed below allow NDS to excel at serving user profiles for Web content personalization.
Multi-Master Replication. NDS replication enables "no single point of failure," unlike master/slave directory implementations. It also enables traffic reduction by having a writable copy of the data at any boundary
Hierarchical Datastore. NDS offers a logical view of data organization and facilitates management delegation. It also enables data to be acted on in a consistent manner (matches XML structures).
Partitionable Database. NDS offers ultimate scalability by enabling database to be divided into manageable pieces. This prevents a fault in one branch of NDS tree from affecting other branches.
Foolproof Reference and Resolve. NDS features non-ambiguous naming and referential integrity in resolves.
Integrity of References. Time-stamped object writes ensure data integrity through both forward and reverse reference links across partitions.
Transparent, Event-based, Attribute Level Replication. NDS features an inherent replicated store with simple rule-based filtering. Triggered attribute- only changes are replicated.
Transitive Synchronization. This feature enables "route around" capability with regard to network and server outages. It also enables server to server communication (read/write) via proxy.
Self Healing/Repairing and Maintaining Database. NDS provides the capability to correct minor inconsistencies automatically. Roll forward/back transaction logging guarantees consistency, while the Database Repair process includes record level locking without service interruption.
Suitability to Task. NDS was designed for a high read rate, with lower demand for writes.
Cross Platform. Novell's support for multiple platforms (NetWare, Windows NT/ 2000, Solaris, Linux, Tru64) enabled CNN to choose the best platform based on application/API availability, suitability to task, price/performance, and internal expertise.
Performance. NDS provides fastest LDAP searches on databases with more than 500 objects, and provides linear scaling up to tested 1.5 billion objects.
Programming Interfaces. NDS supports common, open interface options for access, customization, integration and reporting, including LDAPv3, C++, JNDI, XML, ODBC/JDBC, and ADSI.
Dynamically Extensible Replicated Schema. NDS schema definitions are replicated to all servers, and extensions can be performed with no downtime or protracted performance impact.
SNMP Instrumented. Partitioning and replication operational alarms in NDS can be "trapped" to any SNMP console.
Third-party DSS. CNN looked for the availability of a deployment/ maintenance decision support system.
Supportability. Novell maintains the industry's only Certified Directory Engineer program with nearly 50,000 U.S residents having achieved this status.
Testing NDS Capabilities in the SuperLab
In Novell SuperLab testing, Novell Consulting achieved more than 500 requests per second with less than 60 milliseconds of average latency on a single machine! This test was performed using 50 NT workstations making 10 requests per second each. NDS 8 on NetWare 5.0 (with Support Pack 3a applied) ran on a single, uni-processor Compaq ProLiant 1850R with 1GB RAM.
In benchmarking, it is not enough to extrapolate the numbers up to scale and expect satisfactory performance. As Novell did not have the exact CNN deployment available, some calculations were necessary to predict the scaling in production. Using simple math, we anticipated an absolute minimum of four production servers to handle the load (notice: nothing was included for "overhead"). If NDS is able to achieve near-linear scaling, the simplistic calculation should hold up in the real world.
Implementing NDS eDirectory at CNN
After explaining the NDS architecture and performance characteristics proven in SuperLab, CNN agreed to embark on a pilot in CNN's production network.
After several architectural planning meetings, Novell Systems Engineering built a single NetWare 5 server hosting NDS v8 (first release). CNN used an open-standard LDAP SDK and built an NSAPI (Netscape Server Application Programming Interface) plug-in to the Web server that makes LDAP calls to retrieve data from NDS. The "data" object is actually a custom object that inherits from both user and OU classes. This gives CNN the ability to have attributes appear to be hierarchical (OU containing other objects which can also have attributes).
The Web servers were loaded with the NSAPI LDAP client (with a simple load balancing algorithm) to distribute the requests across the NDS servers. CNN is running Compaq 6400s with 2GB RAM, 1.5GB dedicated to NDS cache and 500 MB for NetWare 5.
For handling the packet load, each server has three Intel EtherExpress Pro/100 Intelligent Server Adapters. As of this writing, network speed has not been an issue. Only one of the network cards has a protocol stack (TCP/IP) bound. In anticipation of future load, two of the network cards could be load balanced (one IP address) and bound to LDAP. The third network card would then handle only the NDS replication traffic using WAN Traffic Manager to set a cost forcing NCP traffic onto its own channel.
Staged Integration Plan
Using NDS, CNN has the ability to personalize content such as ads, stock quotes, weather forecasts, and scores of favorite sports teams. As with any large-scale deployment, CNN is ramping up the use over time with an eye toward scalability and reliability. CNN has implemented specialized ad targeting, and also displays some content based on a user's preferences stored in NDS.
Currently, NDS is handling all 35,000,000 HTML page views per day. There is a one to one correlation of page views to NDS searches. With this load, the NetWare servers are taxed at an average CPU utilization of less than 15%. The average response time is less than 5 milliseconds. The headroom available on the in-place architecture will allow for significant growth without additional processing or storage needs.
* Originally published in Novell AppNotes
The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.