PERFORMANCE REPORT
Articles and Tips: article
Senior Consultant
Systems Research Department
01 Jan 1998
Improving Novell BorderManager Scalability with Intelligent Server Adapters
Summary
Using intelligent LAN adapters in your Novell BorderManager 1.0 server lets you realize the full potential of your server investment by maximizing your server's communication channel performance and scalability. The Intel EtherExpress PRO/100 Server Adapter, co-developed by Novell and Intel as a server LAN channel coprocessor, provides BorderManager with 30% greater scalability than similar systems using non-intelligent adapters.
Figure 1: Intelligent adapters increase BorderManager's scalability by 30%.
Figure 1 shows the increased scalability of BorderManager due to server adapter intelligence, demonstrating the penalty paid by servers using non-intelligent LAN adapters. The intelligent adapter configuration was able to achieve 4097 WebBench hits per second (hps) by using 100% of the CPU to respond to client requests. The configuration using non-intelligent adapters was able to scale to only 2871 hps before saturating the CPU. The lower performance is due to the LAN adapters using 40% of the CPU to process interrupt service routines . These tests were conducted in Novell's 1700-computer Superlab facility in Provo, Utah. (For more information about the SuperLab, visit http://yes.novell.com/devres/slab/.)
Analysis
The goal of these tests was to determine the scalability of Novell's BorderManager server, especially with regard to the server's LAN channel. Scalability is one of the most important characteristics of an Internet or intranet server because the server's measure of scalability can have a significant impact on its day-to-day performance as well as its eventual total capacity. With demand for Internet and intranet services doubling every six months, a server that can scale by making use of additional, inexpensive components to overcome capacity or performance obstacles is a good investment.
In this test case, BorderManager makes use of four Intel EtherExpress PRO/100 Server Adapters to nearly quadruple the LAN channel bandwidth of the server. This intelligent adapter includes an onboard i960 32-bit RISC processor, 2MB of zero-wait state burst access DRAM, and a software driver that allows BorderManager to poll the adapter. (Polling is a much more efficient process than using interrupts.) These components handle the complete I/O process--packet receive and transmit, processing, and data transfer to and from memory--allowing the adapter to service up to 100 Mbps of connection traffic with the least amount of CPU utilization. Non-intelligent LAN adapters, typically used in client computers, do not have these added capabilities and burden the system CPU with all of the work related to data movement in the LAN channel.
The benefits of BorderManager's scalability are two-fold. First, a BorderManager server with intelligent LAN adapters operates more efficiently. The LAN channel uses significantly less CPU time and avoids the overhead of context-switches that are inherent in interrupt service routine (ISR) processing and preemption (the forced rescheduling of the current CPU process). These efficiencies allow the CPU to operate independently of the LAN channel and focus on the server's application logic. In production systems, this efficiency can mean the server's CPU utilization might hover at 70% while servicing peak loads rather than pegging the CPU at 100%. As a second benefit, this same efficiency also allows you to add additional services on the server that might otherwise have required an additional server. Over time, as the server's workload increases, these efficiencies produce an additional 30% throughput.
Figure 2: WebBench performance measurements with intelligent Intel EtherExpress PRO/100 Server Adapters measured in HTTP hits per second (HPS).
Figure 2 displays our WebBench performance measurements with intelligent Intel EtherExpress PRO/100 Server Adapters. With four of these adapters, BorderManager achieved 4097 HTTP hits per second.
HPS
|
MBps
|
%CPU
|
%ISR
|
|
1 LAN |
1277 |
10.6 |
36% |
0 |
2 LANs |
2540 |
21.2 |
66% |
0 |
3 LANs |
3838 |
31.8 |
97% |
0 |
4 LANS |
4097 |
34.4 |
100% |
0 |
Figure 3: The peak results for Figure 2 include WebBench hits per second (HPS), WebBench megabytes per second (MBps), system CPU utilization (%CPU), and the amount of system CPU utilization for LAN adapter Interrupt Service Routines (%ISR).
Figure 3 displays the peak WebBench hits per second (hps) and megabytes per second (MBps) for each configuration. Notice that the performance results for each of the first three LAN adapters were additive--gaining approximately 1200 hps per adapter--while the fourth adapter added only 259 hps to the system total. We know that the LAN infrastructure was not saturated because we've pushed the same four-adapter Fast Ethernet configuration beyond 41 MBps with the same test workload in other tests. Therefore, the bottleneck in this case is the CPU. Once you've reached this maximum scalability in the LAN channel, you would need a faster CPU to further increase performance.
Figure 4: WebBench performance measurements with non-intelligent Intel EtherExpress PRO/100 LAN Adapters measured in HTTP hits per second (HPS).
For comparison, we also ran our WebBench performance tests with non-intelligent Intel EtherExpress PRO/100 LAN Adapters. These represent high-performance, non-intelligent LAN adapters typically used in client computers. The results of these tests are shown in Figure 4.
HPS
|
MBps
|
%CPU
|
%ISR
|
|
1 LAN |
1272 |
10.6 |
39% |
15% |
2 LANs |
2544 |
21.2 |
79% |
30% |
3 LANs |
2871 |
24.0 |
100% |
40% |
Figure 5: The peak results for Figure 4 include WebBench hits per second (HPS), WebBench megabytes per second (MBps), system CPU utilization (%CPU), and the amount of system CPU utilization for LAN adapter Interrupt Service Routines (%ISR).
Figure 5 displays the peak WebBench hits per second and megabytes per second for each configuration. Notice that the performance results for the first two non-intelligent LAN adapters were additive--gaining approximately 1200 hps per adapter. These results are nearly identical to those produced by the intelligent adapter configuration above. However, each non-intelligent adapter required 15% of the CPU's cycles to handle the adapter's Interrupt Service Routines.
When this overhead is combined with the system's LAN channel interrupts, preemption, and context switching, the server is less able to handle additional services or workload due to higher CPU utilization. In the end, this system prematurely exhausted its CPU resources, allowing the third adapter only 327 additional hits per second. This system is less scalable and therefore is more inclined to require an expensive upgrade or replacement to handle the future growth of your intranet or Internet infrastructure.
If you're aiming to build the fastest, most scalable BorderManager server, Novell recommends the Intel EtherExpress PRO/100 Server Adapter in a multiple LAN channel configuration. Even if you're only going to install a single network adapter, the EtherExpress PRO/100 provides the most efficient use of the server CPU and allows you greater flexibility in the design of the server's application set.
Test Workload
We chose Ziff-Davis' WebBench because, like Novell's PERFORM3, it isolates the BorderManager communication channel between server cache and clients--the LAN infrastructure, server LAN adapters, server PCI expansion bus, and server cache. WebBench results provide a clear measure of BorderManager's peak communication channel performance capabilities and scalability. The WebBench workload we used was ZD_STATIC_V11.TST which uses a working data set of 100 files (2.5 MB).
Test Configuration
The system under test, a Novell BorderManager server, was an Intel MB440LX DP Server Platform (266MHz Pentium II with 512KB L1 cache, 1MB L2 cache, 512MB SDRAM, 12GB Ultrawide SCSI storage) running IntranetWare 4.11 and BorderManager 1.0. Commercial servers based on this Intel server platform are available from many server manufacturers.
This system was configured as a web server accelerator and contained five Intel EtherExpress PRO/100 Server Adapters (see Figure 6). Four of the adapters hosted IP subnets, each with nine Pentium Pro clients running Windows NT Workstation 4.0; the fifth adapter hosted a subnet connecting the web server accelerator to the web server and WebBench controller.
Figure 6: With five Intel EtherExpress PRO/100 Server Adapters operating in polled-mode (without interrupts), high-end Intel- architecture servers with four public interfaces can scale to over 40 Mbps throughput.
The LAN included five Bay Networks Fast Ethernet hubs, with the Novell BorderManager server acting as router between the four client subnets and the web server and controller on the fifth subnet.
The web server, a Microsoft Internet Information Server (IIS), was a Compaq ProLiant 5000 (dual 200MHz Intel Pentium Pro processors with 256MB RAM, 2GB system partition, 10GB RAID 0 web partition), running NT Server 4.0 SP3.
Copyright 1997 by Novell, Inc. All rights reserved. No part of this document may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording, for any purpose without the express written permission of Novell.
All product names mentioned are trademarks of their respective companies or distributors.
* Originally published in Novell AppNotes
Disclaimer
The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.