Apache 2.0: All New and Engineered for the NetWare Environment

Articles and Tips: article

Brad Nicholes
Senior Software Engineer
Novell, Inc.
bnicholes@novell.com

01 Apr 2003

This AppNote covers the new features of the popular open source Apache 2.0 web server, as they relate to the NetWare environment.

Introduction
Architectural Enhancements in Apache 2.0
New Features Made Possible by LibC
Site Acceleration (Caching) with Mod_Cache
Load Balancing with Mod_Proxy
Mass Virtual Hosting with Mod_Vhost_Alias
Other Reasons to Migrate to Apache 2.0
Conclusion

Topics	Apache web server, web server acceleration, configuration guidelines
Products	Apache 2.0 for NetWare
Audience	web server administrators
Level	beginning
Prerequisite Skills	familiarity with web server basics
Operating System	NetWare 6.0 and above
Tools	none
Sample Code	no

Introduction

Since its introduction in 1995, the Apache open-source web server has proven to be the most stable, best performing, and most cost-efficient web server on the market. It is no surprise that Apache is currently the most widely used web server on the Internet. But in April 2002, a new contender appeared on the horizon that is proving to be more efficient, just as cost effective, and every bit as stable as Apache 1.3. You might wonder, what could be better than good quality free software built by an organization of developers who have proven themselves to be the best at what they do? The answer is Apache 2.0.

Apache 2.0 was built by the same group of developers that brought you the web server you now know and love as Apache 1.3. Now in its seventh public release, this new version has so many compelling new features and bug fixes that the Apache Software Foundation is strongly encouraging users to migrate from Apache 1.3 to Apache 2.0.

This AppNote summarizes what's new in Apache 2.0 and explains why you, as a NetWare web services administrator, should consider the upgrade.

Architectural Enhancements in Apache 2.0

This section summarizes the major architectural enhancements made to the Apache 2.0 web server.

Apache Portable Runtime (APR) Layer

Apache 2.0 was rewritten from the ground up with two primary goals: to be more portable across different platforms, and to leverage the strengths that each platform has to offer. These goals were achieved via the new Apache Portable Runtime layer or APR. The APR layer was designed to allow all platforms to correctly implement each API specifically for that platform. The Apache Web Server itself was then built using the common set of APIs provided by APR. This allows the source code for the Apache Web Server itself to be completely common and portable across all platforms.

Multi-Processing Modules (MPM)

Another new component in the redesign of Apache 2.0 is the Multi-Processing Module or MPM. The purpose of the MPM is to define the process and threading module that will control how Apache operates on the platform. There is no limit to the number of MPMs that can be implemented for a given platform; however, only one MPM can be used at a time.

Currently, for Unix/Linux, there are several different MPMs that allow Apache to be run in a pure process model, a pure threading model, or a combination of processes and threads. These MPM modules can be swapped out and replaced with the one that best suits the needs of the platform and the work that the web server will be doing.

Since the NetWare operating system utilizes only threads, at this time there is only one NetWare MPM which allows Apache to be run in a pure threading model.

Dynamic Thread Allocation

With the new architecture of the Apache Web Server and the incorporation of the NetWare MPM, there are several new configuration directives that you must be aware of when configuring your web server. These directives allow you to specify how many worker threads your web server will use, as well as how and when those worker threads will become active.

ThreadStackSize - the amount of stack memory assigned to each worker thread
StartThreads - the number of worker threads launched at server startup
MinSpareThreads - the minimum number of idle threads allowed before additional worker threads are created
MaxSpareThreads - the maximum number of idle threads allowed before excess worker threads are destroyed
MaxThreads - the maximum number of worker threads allowed
MaxRequestsPerChild - the maximum number of requests that a single worker thread will handle before being restarted

Under the old architecture of Apache 1.3, the configuration file specified a number of worker threads to be instantiated at startup time. This initial number of worker threads was also the total number of worker threads available to handle requests for the lifetime of the Apache Web Server instance.

Apache 2.0 employs a dynamic threading model, which means that when Apache starts, a certain number of worker threads are automatically set to an active state. As the web server traffic increases, Apache automatically starts up additional worker threads to handle the excess traffic. As the load becomes lighter, Apache terminates idle worker threads to allow the system to reclaim the unused resources.

For example, under normal traffic 50 active threads may be all that the web server needs to handle the load. But during a high traffic period, Apache may be required to spin off as many as 250 or more worker threads to handle the increased load. Then, as the traffic returns to a normal state, Apache will automatically terminate the idle worker threads to allow the system to reclaim the unused resources. This feature allows Apache to be much more scalable when it needs to be, while still being conservative with the resources available on the machine.

New Features Made Possible by LibC

The implementation of Apache 1.3 for NetWare has proven itself to be very stable and reliable, just like on other platforms. But unlike other platforms, Apache 1.3 for NetWare was missing some features. Two of the more significant missing features were binary CGI support and standard log rotation. The reason why these features were missing was simply because the NetWare CLib libraries did not include an implementation of pipes. A pipe is a means by which two independent NetWare Loadable Modules (NLMs) are able to talk to one another.

Apache 2.0 is based on the LibC C runtime library, rather than CLib. LibC is a much more robust C runtime library that not only provides needed features like pipes, but also enhanced multiprocessor support and a POSIX API layer. The enhanced feature set of LibC allows applications such as Apache to be ported to NetWare much more easily. Other open source projects such as PHP, PERL, and even MySQL and Postgres, have also taken advantage of the increased portability of LibC's POSIX layer.

Binary CGI Support

Apache 2.0 for NetWare is able to provide standard CGI capabilities, since this functionality is based on pipes. This allows Apache to spawn external NLMs to handle requests that Apache itself is not equipped to handle. A CGI NLM can be invoked to interpret a script request or some other specialized functionality such as selecting a random string from a text file. CGI NLMs come in real handy when you want to execute a specialized piece of functionality from an SSI (Server-Side Include) page.

Standard Log Rotation

With the availability of pipes in LibC, Apache 2.0 for NetWare also has the ability to rotate logs in the same standard manner as other platforms. This standard log rotation functionality is implemented as a separate binary that is spawned by Apache at startup time. Apache then creates a pipe between itself and the log rotation NLM (ROTLOGS.NLM) so that error messages or other logging information can be passed to the NLM. This allows the rotation NLM to handle rotating the log file at the appropriate time without interrupting the normal operation of the web server.

Log Rotation Based on Size

A little known feature that has been added to the standard log rotation module is the ability to rotate the logs based on file size rather than time. This allows a system with limited disk space to rotate logs whenever they reach a certain file size rather than rotating them every night at midnight, for example. Not only does this decrease the total number of logs, it also allows a single log to contain more information.

Setting up log rotation based on size is similar to setting up time-based rotation. The main difference is that you specify a file size rather than a rotation time. When configuring a custom log that rotates after a certain period of time, the second parameter to the log rotation utility is the time period. When configuring a size-based rotation, simply replace the time period with the file size in megabytes. Since both time periods and file sizes are specified as integers, the size value must be terminated with an "M" (denoting megabytes) to distinguish it from a time value. (For further information about the CustomLog directive, see http://httpd.apache.org/docs-2.0/mod/mod_log_config.html #customlog.)

The following example configures Apache to log common access entries through ROTLOGS.NLM to the file "logfilexxxx". When the file size reaches 5 megabytes in size, it will automatically be rotated and a new file will be created.

CustomLog "|sys:/bin/rotlogs.nlm sys:/apache2/logs/logfile 5M" common

In the case of the access log, this would be a good solution for sites that might not receive a high number of requests. On the other hand, for a high traffic site this solution might result in the log being rotated too frequently. Depending on the web server configuration, a combination of both time-based logging and size-based logging would probably produce the best results.

One thing to keep in mind is that log rotation only rotates the log file; it does not move or archive the rotated log file. Therefore, to avoid consuming all of the disk space on a particular volume for high traffic web sites or web sites that log a significant amount of information, an additional process must be implemented. The web administrator should implement a process, either manually or through the use of an automated tool or script, for moving and/or archiving the rotated log files.

Site Acceleration (Caching) with Mod_Cache

One of the missing features in Apache 1.3 on all platforms was caching. To be truthful, caching wasn't really missing, it was just well hidden and not very well developed. Most of the caching features of Apache 1.3 were included in the Mod_Proxy module. If you configured your Apache server as a proxy server, you could also configure it to do some basic proxy request caching. This allowed you to cache page requests that would normally have been sent to the origin server either through a proxy request or as a reverse proxy.

Mod_Cache

With Apache 2.0, all of the caching functionality was removed from Mod_Proxy and implemented as its own module, called Mod_Cache. Mod_Cache implements an RFC 2616 compliant HTTP content cache that can be used to cache either local content or proxied content. It is also implemented using pluggable modules that allow it to handle different methods of caching.

In the current implementation, mod_cache includes two pluggable modules. The first is mod_disk_cache which implements a disk-based storage manager. The second and most interesting is mod_mem_cache which implements a memory-based storage manager. In this AppNote we will only discuss the memory-based manager. (For more information on mod_disk_cache, see the documentation at http://httpd.apache.org/docs-2.0/mod/mod_cache.html.)

Mod_Cache, along with mod_mem_cache, exposes a number of directives for configuring Apache to serve cached pages. It allows Apache to serve pages from a dynamic memory cache rather than having to retrieve the documents from disk or from a proxied server. The following illustrates a simple set of directives for loading and configuring mod_cache:

LoadModule cache_module modules/mod_cache.nlm
<IfModule mod_cache.c>
 LoadModule mem_cache_module modules/mod_mem_cache.nlm
    <IfModule mod_mem_cache.c>
     CacheEnable mem /
        MCacheSize 4096
      MCacheMaxObjectCount 100
     MCacheMinObjectSize 1
        MCacheMaxObjectSize 2048
 </IfModule>
</IfModule>

The above configuration first loads both the Mod_Cache and Mod_Mem_Cache modules. Most of the configuration is handled through the directives that are supplied by the Mod_Mem_Cache module. The CacheEnable directive tells Mod_Cache to turn on caching and also specifies what type of caching it intends to use, as well as which pages to cache. In the "CacheEnable mem /" line, "mem" indicates that we want to use memory-based caching and "/" indicates that we want to cache all requests made to the web server.

The next set of MCachexxx directives configures the way the memory caching will be handled. MCacheSize specifies how much memory the caching module is allowed to use before older cached entries are removed. The MCacheMax- ObjectCount, McacheMinObjectSize, and MCacheMaxObjectSize define how many objects are allowed to be stored in the cache at any one time and what the maximum and minimum size of a cached object can be. Any object that does not fit within these parameters will not be cached.

Caching web pages can significantly increase the performance of the Apache Web server, but at the cost of memory resources. According to our performance testing (see Figure 1), installing Mod_Cache shows an approximate 33% gain in the number of requests per second the Apache web server is able to handle.

Figure 1: Performance comparison between caching and non-caching.

A more specialized configuration tailored to a specific site, could produce better results. For example, if you have a web server that has limited memory resources, it may work better to cache only the most frequently accessed pages and limit the total cache memory size. Or perhaps limiting the cache to only larger files or files within a specific directory or location would do the trick. These are all parameters that can be customized for each implementation of Apache.

Load Balancing with Mod_Proxy

Mod_Proxy is one of the standard modules that has been rearchitected for Apache 2.0. Some of the differences since Apache 1.3 include:

Removing the caching functionality so that caching can be utilized in more situations than just proxy
Redesigned to be compliant with HTTP/1.1 including KeepAlive requests
Implementation of pluggable protocol handler such has HTTP and FTP
Use of the new Apache 2.0 filters to accurately filter the data as it flows through the web server

One of the more interesting uses of Mod_Proxy is its reverse proxy feature. Basically, a reverse proxy allows the web server to act as a mirror to one or more remote servers. It provides the ability for a remote server to be mapped into the space of the local or proxy server. Configuring your Apache Web server as a reverse proxy allows it to handle mundane tasks like authentication, SSL encryption, and caching while passing the more specialized or compute-intensive requests to back-end servers.

One implementation of a reverse proxy environment might be a front-end Apache web server that sits outside of the firewall and passes requests to other servers inside the firewall. The servers inside the firewall can be configured to handle CGI requests, database requests, web applications, or other types of requests that require more processing time. This implementation is illustrated in Figure 2.

Figure 2: Reverse proxy implementation.

Passing the process-bound requests to the back-end servers leaves the proxy server free to handle encryption, authentication, caching, or serving of simple static pages.

Configuring Apache as a reverse proxy server is easily done. No additional configuration needs to be done for the back-end servers, since all they know is that a request has been received that needs to be satisfied. However, the reverse proxy server needs to be told, through the ProxyPass and ProxyPassReverse directives, how and where to send specific requests. An example configuration is shown below:

LoadModule proxy_module modules/proxy.nlm
<IfModule mod_proxy.c>
 LoadModule proxy_http_module modules/proxyhtp.nlm
    ProxyRequests Off

 #Reverse proxy to expense reporting web application server
   ProxyPass /expense/ http://www.expense.com:53080/expense/
    ProxyPassReverse /expense/ http://www.expense.com:53080/expense/

  #Reverse proxy to my general web application server
  ProxyPass /webapps/ http://www.webapps.com:53080/webapps/
    ProxyPassReverse /webapps/ http://www.webapps.com:53080/webapps/

  #Reverse proxy to other applications allow redirects
 ProxyPass /directapps/ http://www.directapps.com/
</IfModule>

The first thing that this configuration does is turn off all forward proxy requests with the ProxyRequests Off directive. This is a security measure to ensure that client browsers do not attempt to use your reverse proxy server as a forward proxy server. You don't want to give browsers the ability to use your web server as a proxy to other web servers which you don't control.

Each ProxyPass directive simply opens a connection to the back-end server and redirects the request using the new connection.

The ProxyPassReverse directive allows the reverse proxy server to alter the Content-Location and URI headers of the response. This forces any redirected response by the back-end server to be redirected to the reverse proxy server instead. This prevents a browser from bypassing your reverse proxy server and attempting to directly connect to one of your back-end servers. If you want to allow the browser to be redirected to the back-end, do not specify a ProxyPassReverse value. This will allow the original header generated by the back-end server to remain intact, which will contain the original location of the back-end server rather than the reverse proxy server. When the browser receives the response, it will automatically redirect to the back-end location.

In addition, since your back-end servers are behind a firewall, there is no need for them to require client authentication or SSL connections because the only client that is accessing these servers from outside the firewall is the trusted reverse proxy server. The reverse proxy server itself can be configured to challenge clients for authentication credentials and to encrypt each response via SSL. It acts as the security front-end outside your firewall. Nothing gets access inside the firewall without passing through the reverse proxy server first.

Another great use of a reverse proxy server is as a front-end to all of your commercial non-Apache web servers. If you think about it, this is a great way to secure an otherwise unsecure commercial web server with a trusted Apache 2.0 web server.

Mass Virtual Hosting with Mod_Vhost_Alias

One of the basic features of all web servers is their ability to make a single instance of the web server appear to be multiple instances-in other words, the ability for a single Apache web server to handle requests for multiple domains. This is basically the opposite of the reverse proxy environment where multiple web servers appear to their clients as a single server.

One way to configure a virtual host within the Apache Web server is through the configuration file itself. The HTTPD.CONF file includes examples of defining a <VirtualHost> block that includes all of the directives necessary for a single virtual host. The downside to configuring a virtual host in this manner is that with each new host, new configuration directives need to be added to the HTTPD.CONF file and the Apache server needs to be restarted.

With both Apache 1.3 and Apache 2.0, the process for setting up a virtual host becomes much easier through the use of Mod_Vhost_Alias. This module allows an administrator to dynamically add or remove virtual hosts by simply altering the directory structure on the file system.

For example, suppose that an Apache server is currently serving pages for the domain www.mycompany.com and a new company named Bayline Inc. decides that it wants an online presence also. To configure the web server to serve pages for the Bayline domain using the HTTPD.CONF file method, the administrator would have to add a <Virtual Host> block to the configuration file and then restart the web server. With Mod_Vhost_Alias, the new domain can be added by simply creating the subdirectory /www/hosts/www.bayline.com. The new domain is immediately online and ready to start serving pages.

LoadModule vhost_alias_module modules/vhost.nlm
<IfModule mod_vhost_alias.c>
 # get the domain name from the Host: header
  UseCanonicalName Off

  # this log format can be split per-virtual-host based on 
    # the first field
    LogFormat "%V %h %l %u %t \"%r\" %s %b" vcommon
  CustomLog logs/access_log vcommon

 # include the server name in the filenames used to 
  # satisfy requests
   VirtualDocumentRoot /www/hosts/%0/docs
   VirtualScriptAlias /www/hosts/%0/cgi-bin 
</IfModule>

The first thing to remember when configuring Mod_Vhost_Alias is to set the UseCanonicalName directive to Off. This instructs the Apache server to use the host name as it was sent by the client browser, rather than attempt to reconstruct the host name from values set in the configuration file. Mod_Vhost_Alias works by constructing a physical directory path using the value specified by the directive VirtualDocumentRoot and substituting the host name in the path. If the server is allowed to canonicalize the host name, the resulting path may not match. As mentioned previously, the phyical document location is derived from the host name submitted by the client browser.

Using the example configuration above, a new virtual host can be configured by simply creating a new directory with the domain name under /www/hosts and then adding the subdirectories /docs and /cgi-bin. Mod_Vhost_Alias provides a lot of flexibility when specifying how the actual path is contructed. The physical path can be constructed using all or part of the host name, using all or part of the host IP address, or by mixing and matching any of the above. (For more information, see the Mod_Vhost_Alias documentation at http://httpd.apache.org/docs-2.0/ mod/mod_vhost_alias.html.)

In this example, a request for www.bayline.com would result in the document being retrieved from the /www/hosts/www.bayline.com/docs directory.

The downside to creating virtual hosts through Mod_Vhost_Alias is that all of the virtual hosts must be configured in exactly the same way. The only way around this is to supply additional configuration directives through a .htaccess file contained in each virtual host's document root directory. This would be a good way to add security for some domains but not for others.

Other Reasons to Migrate to Apache 2.0

There are many other reasons to migrate to Apache 2.0. Most of them have to do with the additional modules or module features that have been included with Apache 2.0.

Mod_DAV

Mod_DAV is a module that existed for Apache 1.3 as an external third-party module. It is now one of the standard modules shipping with Apache 2.0. Not only is it a standard module, it has also been redesigned to be more flexible in how it can be used. The previous version of Mod_DAV only allowed for file system access. The 2.0 version of Mod_DAV includes the ability to plug in additional access modules to provide DAV access to other types of technologies in addition to the the file system.

Mod_SSL

Mod_SSL is another third-party module that is now part of the Apache 2.0 release on most platforms. It is not part of the NetWare release because all SSL encryption on NetWare is handled by the operating system, rather than having to employ an external module such as Mod_SSL.

Mod_Ext_Filter

Mod_Ext_Filter is a new module that was built originally to show off the new filtering features of Apache 2.0. It can be thought of as an alternative way of doing CGI. Mod_Ext_Filter allows any external program which reads from stdin and writes to stdout to be used as a filter. This would be similar to using the DOS "sort" command to sort the output of the "dir" command. In the command "C:\>dir | sort", the output from the "dir" command is passed through the "sort" program before being displayed on the screen. In a similar fashion, the Mod_Ext_Filter allows the output of any HTTP request to be passed through an external program before being sent to back to the client browser.

Mod_Logio

Mod_Logio is another new module that is only available with Apache 2.0. Its purpose is to count all input and output bytes on a per-request basis. It provides additional logging directives to enable the byte counts to be included in a standard log entry. So by loading Mod_Logio, all access log entries could also include the number of bytes received and the number of bytes sent.

Mod_Rewrite

Mod_Rewrite is a powerful way to manipulate URLs on the fly. It allows the administrator an unlimited number of ways to rewrite and redirect HTTP requests as they pass through the Apache web server. Rules are written and executed using standard regular expression syntax which can be based either on the URL itself or other external information contained in either the request header or web server defined environment variables. The downside to Mod_Rewrite is that is very complex; the upside is that its complexity allows it to be extremely powerful.

Mod_Auth_LDAP

Mod_Auth_LDAP is another module that started out as an external third party module. It was brought into Apache 2.0 as a standard module, later removed into its own Apache Software Foundation project, and then once again brought back into Apache 2.0. It is currently included as an experimental module pending additional testing and SSL capability.

On the NetWare platform, Mod_Auth_LDAP has been tested extensively and will be used to provide authentication services for the Apache Manager. When used in combination with Mod_eDir (a third-party module provided by Novell), it provides Apache 2.0 for NetWare with a full featured authentication and authorization service. Mod_eDir ties Apache in with Novell's eDirectory to provide authentication, remote file system access, and home directory services.

Conclusion

Over the years Apache 1.3 has proven itself to be the "best" web server on the market. Its stability, reliability, and the fact that it is free of charge has made it the most widely used web server on the Internet. Apache 2.0, which was released in April 2002, has already surpassed Apache 1.3 in performance, especially on non-Unix platforms such as NetWare. With the addition of many new modules and features, as well as the ability to leverage the strengths of the NetWare platform, Apache 2.0 is the best web server on the NetWare platform; in fact, it is the only web server that will be shipping with NetWare 6.5.

* Originally published in Novell AppNotes

Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.