GW Sizing Recommendations (scalability and tuning)

(Last modified: 20Aug2003)

This document (10016883) is provided subject to the disclaimer at the end of this document.

goal

GW Sizing Recommendations (scalability and tuning)

How can I determine hardware and other requirements for my GroupWise system?

fact

Novell GroupWise 6.5

Novell GroupWise 6.0

Novell GroupWise 5.5

Novell GroupWise 5.2

fix

This information is provided subject to the disclaimer at the end of this document.

GroupWise Sizing Document

This document will give guidelines for planning and implementing the deployment of GroupWise 5.5. Most recommendations can be applied to 5.2., but Novell suggests they are used for GroupWise 5.5. These recommendations are based upon a mixture of in-house testing, 3rd party lab performance studies and technical support's experience.

As a preface to the recommendations, a list of assumptions have been given. These assumptions can be used as guidelines for planning the growth of an existing GroupWise system or the installation of a new GroupWise system.

It is intended that these recommendations be used with discretion based upon each specific implementation and the circumstances surrounding it.

This is a living document and will change as determined by experience, new technology and noteworthy feedback.

Assumptions
***************
Version of GroupWise:
   - 5.5 or higher
   - Any patch level of 5.5
Server:
   - Pentium Pro 200 or better
   - Client to Server LAN speed of 10 MB
   - Server to Server LAN speed of 100 MB
   - Only handles E-Mail*
*Note - Users are not logging into this server for applications, NDS authentication, or file/print services. Likewise, this server is not being used for NDS replication.

Domain (data store):
   - One per server
   - One MTA per Domain
Message Transport Agent (MTA):
   - One per server
   - Run on same server as the Domain
   - Run with IP Links to other Domains and Post Offices
   - Run on separate server from the Post Office (data store) or POA*
*Note - This recommendation is specific for those domains that act as high volume mail traffic hubs. In most circumstances there is not a problem with running a Domain with a MTA on the same
server as a (1) Post Office with a POA.

Post Office (data store):
   - One per Server
   - One POA per Post Office*
*Note - When running Document Management with a moderate to heavily used Library, it is recommended to have a second POA running on a separate server to perform the Indexing tasks of the
POA.

Post Office Agent (POA):
   - One per Server
   - Run on same server of which the Post Office (data store) resides
   - TCP/IP (client/server) connection is used for Client access

Recommendations
**********************
Based upon the above mentioned assumptions, here are the recommendations for sizing a GroupWise system.

Domains per System -
There is not a limitation to the number of Domains contained in a System. However, Novell Technical Services does not recommend that this number get too high for the sole purpose of synchronization. The more Domains in a System, the more databases have to be synchronized. It would be inaccurate to provide a static number to define "too high". This needs to be determined on an individual basis, keeping in mind that more is not necessarily better. Here are some guidelines to help determine if and when a Domain is needed:

1) The Primary Domain should not have any Post Offices and should have a direct TCP/IP link to all Secondary Domains where possible. This ensures that administration traffic is isolated from message delivery routes. It also ensures that administration changes are replicated as quickly as possible throughout the system. However, the Primary Domain should not be used as a routing Domain.

2) Routing Domains should be used at WAN to LAN connection points and service the remote Domains that come through these points.

3) If a "routing" Domain is going to be used, it should exist on its own server and can service 60+ links based upon LAN/WAN traffic and speed.

4) If the system contains multiple, high traffic gateways, all gateways should be put on a separate "gateway" Secondary Domain. Additional Domains for Internet traffic (primarily GWIA and Web Access) can be placed at WAN/LAN hub locations to reduce the time/cost for message delivery and network bandwidth. If there are other high traffic gateway they to can be placed on separate Domains on separate boxes to boost the processing power.

5) For Web Access and Remote Async connections, sometimes it is cheaper to have users access them locally, instead of an 800 line. In this case, putting a Domain in the remote location allows the gateway to run local to the users reducing the cost of long distance and/or 800 numbers.

6) If there are poor communication lines and/or high cost lease lines, an Async link may be preferred. A Secondary Domain at the remote location would be required to allow for the Async to Async
connection.

7) Other situations would include: use of Dial-up Routing where link scheduling could reduce costs and External Synchronization with other GroupWise 5.x systems where there is a high volume of
administrative traffic.

Post Offices per Domain -
Again, there is not a limitation to the number of Post Offices that can be contained in a Domain. In realistic terms, the number of Post Offices per Domain is not as important as the number of messages handled by the MTA for that Domain. For example, a Domain with 50 Post Offices and each Post Office has 30 users in it, would be acceptable if the LAN/WAN speed where sufficient to handle the amount of traffic generated by these users. Here at Novell the Orem MTA handles an average of 100,000 messages per day without problems. This MTA is on its own NetWare server and the server has no other processes running on it.

*Note - Other factors should be taken into consideration as well. Such as:
   - LAN/WAN speed
   - Processing power of the server the MTA is running on
   - Average number of administration messages per hour in the system
   - Busy search demand
   - Remote async requests (remote requests put a greater strain on the system than direct users)
   - The number of gateways in this Domain
   - Average user's activity on the GroupWise System
   - The amount of rules the users have active

Number of POA's (Post Office Agents) per Post Office -
The recommendation for this is one (1) POA per Post Office*. This POA should also be run on the same server that houses the data store. By running this configuration the POA is able to utilize
built-in "Load Balancing" to determine which threads and/or requests get priority over others. When a second POA is loaded on the same server, one process cannot control the threads of the other
process and therefore the two processes end up competing for time.

*Note - If a second POA is needed for Quickfinder Indexing or for faster processing, it is recommended to run the second POA on a different server. This can be connected via the LAN
(recommended to be on the same network segment as the first server) or a second NIC can be installed with a crossover cable to either a NetWare server or an NT server where the second POA
will run.

Number of Users per Post Office -
This is a recommendation that needs a tighter definition to both "users" and "Post Office".

First, the term "users" is loosely used throughout the groupware industry. Novell defines "users" as a client who is actively sending, reading, or otherwise using their mailbox. Other vendors user
this number to specify the total number of users regardless of the activity. This document will refer to these users as "active users". Therefore it is up to each administrator or integrator to determine
what maximum percentage of "Total Users" will be actively using mail at any given time.

Second, the term "Post Office" has different meanings with different groupware packages. Historically GroupWise has only defined it to be a "grouping of users in a given data store location". As
the groupware industry grows it is becoming more common for a "Post Office" to mean a "grouping of users in a given data store on a dedicated server. Or in other words mailboxes per server.
Although Novell doesn't require this be the case, it is a preferred method of implementation.

Given the second definition, Novell recommends that the number of active users per Post Office do not exceed the range of 500 to 700.* With the assumption that on average only 60% to 70% of
users will be logged in and using mail at any given time, GroupWise could easily support a Post Office of 1000+ "Total Users". Novell's intent is to help the reader distinguish what values will
realistically affect the system and not just give an impressive large number. Here are some things to consider when deciding on a acceptable number:

- LAN/WAN speed and topology. The slower the network and the more hops a packet has to make, the slower the performance will be. This gets worse as both the GroupWise client and
GroupWise POA start flooding the network with TCP/IP packets.
- Clean-up policies. Without standard clean-up policies implemented, there is no way to control the size of the GroupWise databases or the user's mailbox. The larger the databases get, the longer
maintenance routines will take. In addition, as the number of messages the POA must query for Finds and/or indexing increases, the POA slows down. The larger the user's mailbox (database) gets,
the longer it takes the client to display the items, resulting in a dissatisfied user.
- The amount and size of attachments the average user is expected to have. Here in Novell Technical Services, the Post Office has less than 300 users due to the size of attachments each user
has, these include large databases and core dumps mailed from customers. This example resulted in needing to reduce the number of users on a Post Office for disk space reasons.
- Backup and Restore. This can be an issue whether the problem is too many users or too much mail. This issue requires decisions be made that will be very specific to each situation. Solutions will
include limiting the number of users per Post Office, adding more disk space, implementing an e-mail policy, restricting the size of attachments, limiting the number of messages, and billing the user
when limits are exceeded.

*Note - This number is based upon "active users" concurrently using GroupWise with a direct TCP/IP connection. The total number of users is only limited to how many of those users will be active
at any given time. Novell has seen and foresees this number changing as technologies change. In addition, the performance of the Client, Agent and Server can vary dramatically as settings for both
the POA and the Server are adjusted. These performance settings and changes are listed in the appendix of this document.

Number of Web Access users per Post Office -
The bottom line recommendation for this is the same as above, 500 to 700 active users. The basic service to the client is handled through the POA the same as a direct TCP/IP client (given that the
Web Access Gateway is set up to do Client/Server). However, this recommendation can vary based upon several aspects: Web performance is acceptably slower (people expect the Internet to be
slow), typically there are fewer active users at any given time, and due to slower speeds the average user will avoid sending large attachments. With these aspects in mind, the number of Web
Access users per post office can be greater than direct connect users. In the case of ISPs, the number can be even greater due to the limitation of modems that the ISP supports at a given time (IE.
if an ISP has 500 modems, then 500 active connections are all that can be supported, therefore the total number of users per Post Office is unlimited).

* Note - This section is added to provide recommendations for those installations who will provide Web based access as their primary client access. Web Access is a gateway and doesn't have
users specific to it. However, the gateway does provide Web based access to a GroupWise mailbox which has to be defined in a specific Post Office within the GroupWise system.

MTA Memory Requirements -
These memory requirements are in addition to memory needed for the operating system.

Minimum required (small system with 3 to 5 direct links) - 10 MB
For larger systems (more than 3 to 5 direct links), add the following per link to the 10 MB minimum required:
       Light to moderate traffic (less than 50,000 messages routed per day) + .2 MB
       Moderate to heavy traffic (greater than 50,000 messages routed per day) + .5 MB

*note - These numbers were gathered and compared against live systems, not lab environments. Additionally, they are the based upon averages. Novell would recommend that these are exceeded
and not just met.

POA Memory Requirements - The memory requirements for a POA (Post Office Agent) will vary based upon the number of "active" users. The total number of users in the Post Office is irrelevant.
Again, these numbers are based upon the above mentioned assumptions.
   100 active users - 94 MB additional memory
   250 active users - 208 MB additional memory
   500 active users - 232 MB additional memory
   700 active users - 274 MB additional memory

Common Questions and Answers
***************************************
Q. How do I plan for increase in "Active Users" as my users get more familiar with the product?
A. The best way to do this is to first determine the current usage or expected current usage. Then plot the expected growth of business need (What will the groupware needs of the company be
over the next several years?). Last of all, examine the collected data and set numbers based upon the expectations. Also, keep in mind that as technology expands, these recommendations will
change and the demands on the groupware system will probably increase.

Q. How do I determine the number of Post Offices that my server can house?
A. The answer to this is ONE! This is not to say that if there is currently more then one Post Office on a server, it is wrong. Historically, Novell has had many different answers to this question, and
as a result many different things have been implemented. Unfortunately, most have been based upon success or failure of specific experiences and then generalized to answer this broad question.
In giving this answer, Novell will support it with facts that will offer the BEST solution, not the only solution.

When multiple POAs run on the same server each will initialize threads for anticipated work loads. In periods of idle time these threads have to be maintained and controlled by the owning process
(the POA in this case) thus taking resources.

Example: there are three POAs on a given server, two POAs are relatively idle and the third POA is very busy. The third POA cannot use the other two POA's threads and yet those threads are still
taking resources just to be maintained. In addition, if one of the idle POAs hits a trigger to kick off a QF Indexing or a GWCheck, then there is no way for the busy POA to prioritize that thread with its
active Client/Server threads. In either case, the recommendation of ONE Post Office per server and one POA per server is the BEST solution.

Q. How do I determine the maximum number of Agents that can effectively run on one server?
A. The memory requirements are list above. This is the most important information to keep in mind. However, there are other things that factor into this:
-Such as single point of failure. If the server goes down, how many users will be affected?
-Work load, are users doing busy searches, cross post office proxies, remote, sending large attachments, document management, and using the find feature?
-What else is the server responsible for: NDS authentication, NDS replication, file/print services, or other applications?
-What level of user access will the server have?

Novell's base recommendation is to only have one Agent per server. However, we realize that this is asking a lot in most situations. Therefore, a more acceptable recommendation is two per server.
This, in most cases, will be one MTA and one POA with the exception of the MTA that is responsible for 50+ links (with at least half as direct links) acting as a routing domain.

Q. What factors were considered when coming up with the recommendation of users per Post Office?
A. There were four main considerations:
-Performance - A servers ability to handle the POA requirements decrease as these recommendations are exceeded. This includes: TCP/IP (client/server) requests, thread usage, processor
demands, and disk I/O.
-Manageability - These ease of managing users, libraries, distribution lists, and NDS rights.
-Maintenance - Time it takes to run GWCheck, backup and restore Post Office data, and general maintenance routines.
-Disk space - Amount of space required for database growth, and attachment blob areas.

Q. What is the possible cause of my system not working when the recommendations have been followed?
A. The most probable cause of problems, when recommendations are followed, come from the tuning of the server and/or agents or underestimating the activity of the users. For the server and
agents, the appendices following this document contains recommended settings and their descriptions. Sometimes these settings need to be adjusted to fit specific situations and circumstances. For
the users activity, the best solution is to plan for growth and increased use of the e-mail system. This would include: addition of Post Offices, addition of hardware, implementing e-mail policies and
not filling Post Offices to their maximum before considering a growth plan.

Appendix A
**************
Performance, Tuning and Optimization - TID 2943356 covers the various areas to optimize your server's performance. It also looks into areas of pro-active/preventive maintenance on your server
and how to achieve the best results. These actions can also prevent the possibility of server ABENDs and crashes.

High Utilization - This is a downloadable document from the Novell Support Connection, URL HTTP://www.support.novell.com. The file name is highutl1.exe and can be found with the File Finder or on
the Current Patch List page.

Appendix B
*************
Server settings NTS has found to directly affect the Agents performance: (The defaults are based on NetWare 4.11)

1. Directory Cache buffer Nonreferenced Delay = 30 sec (default = 5.5 sec)
This setting will decrease processor overhead and I/O traffic. It determines how often the Directory Cache buffer is refreshed. Every refresh requires a new disk read and write to memory. By
increasing the value to 30 seconds, the administrator is decreasing how often the refresh takes place. There is little danger in losing data. As new files are added to a directory structure, the buffer
is dynamically updated. This feature is a safeguard for rare cases in which a file did not get added to the buffer for some reason.

2. Minimum Directory Cache Buffers = 2-3 per connection (default = 20 )
By increasing this value, the buffer is already established and no additional resources are required to allocate more buffer space on the fly. This can eliminate processor and I/O bottlenecks.

3. Maximum Directory Cache Buffers = 4000 (default = 500)
This setting protects the system from using too much memory for Directory Cache Buffers, but the default does not give the system enough room to grow. Setting it at 4000 gives the system some
leeway, but may require the addition of physical memory.

4. Read Ahead Enabled (default = On)
The Read Ahead feature significantly increases performance on NetWare servers. Read Ahead feature predicts what files are required next and loads them in memory ready for access.
   
5. Read Ahead LRU sitting Time Threshold = 60 sec (default = 10 sec)
This feature is for the Read Ahead mechanism. This Read Ahead LRU sitting Time Threshold says that if the LRU (Least Recently Used) is below the specified time, do not use the Read Ahead
feature. LRU is an algorithm that is used for memory block / page replacement. An 'LRU list' identifies the least recently used cache blocks (blocks that have been in cache the longest time without
being accessed) and flags those for use first. It makes for a more efficient caching implementation. The reason for the above setting is that if there is not enough memory to access data from
available cache, Read Ahead will take up memory and processor time without increasing performance. If Read Ahead is not helpful, it makes sense to not use the resources. This setting can be
configured up to 1 hour. In general terms, if the LRU is 20 minutes or better, the system probably has sufficient memory. This setting could be effective anywhere from a minute to possibly 5 minutes.
Be aware that this disables Read Ahead which usually is not a recommended thing to do. If this option is used a lot, it is probably time to add more memory.

6. Maximum File Locks = 20,000 (default = 10,000)
Although GroupWise does much more record locking than file locking, if there are a lot of users on the system, it is wise to allocate enough file locks. This does require memory and should not be
over used.

7. Maximum Record Locks = 100,000 (default = 20,000)
GroupWise performs a lot of record locks. If there are a lot of users on the system, it is wise to allocate enough record locks. This does require memory and should not be over used.

8. Minimum Service Process = 100 (default = 10)
Service Processes are dynamic. By pre-allocating them, less overhead is required to allocate them on the fly. As long as there is sufficient memory, this number can be increased. A good rule of
thumb is to monitor the server during peak times. Set the Min Service Processes to whatever the current service processes are during peak times.

9. Maximum Service Process = 1000 (default = 40)
This also takes up resources. Monitor this setting in the monitor.nlm. If the current processes begin to approach the maximum, increase the maximum service processes.

10. New Service Process Wait Time = .3 sec (default = 2.2 sec)
This setting can drastically increase performance. When a Service Process is required, a new one can be created quickly. With the default setting of 2.2 seconds, the theory is that if the system
waits long enough, a process will become free. If there is sufficient memory, there is no harm in creating a process instantaneous to the initial request.

11. Minimum Packet Receive Buffers = 2 -3 per user* (default = 50)
Any request that is processed uses a Packet Receive Buffer. This includes all NCP requests, SAPs, RIPs, TCP packets, etc. If the server is bombarded with requests and there is not enough packet
receive buffers, the system will get bottlenecked and will start dropping requests. The result is loss of connection to users, loss of server to server connections, slowness, etc. Monitor the current
packet receive buffers during peak times and make sure that the minimum is set to that current setting so that there enough packet receive buffers at all time. Remember, this also takes up memory.
Be sure to have sufficient memory on the server.

*Note - A server hosting the Web Access gateway should set this to 2000.

12. Maximum Packet Receive Buffers = 4000 (default = 100)
This protects the server against too many packet receive buffers allocating to much memory to processes.

13. New Packet Receive Buffer Wait Time = .1 Sec (default = 1 sec)
If the server has sufficient memory, this setting can significantly increase productivity. As with service processes, the server will immediately spawn a new buffer without waiting to see if one
becomes available first.

14. TCP Defend Land Attack = Off (default = On)
This feature protects the TCP/IP stack against LandAttacks. LandAttacks are packets sent to the server with the same source and destination. The packets get into a loop and can bring the server
down. If the server in question has no access to the outside world, the chance of a packet doing this is extremely minimal. By turning this unneeded feature off, overhead is reduced and IP packets
can be processed faster.

*NOTE: Many of the options above warn against having enough memory. Each additional buffer allocated takes up about 4k. Each service process requires about 16 k. The best way to determine
sufficient memory is to watch the LRU count and the Available Cache Buffers. If these numbers drop, LRU below 20 minutes and Cache Buffers below 40%, more memory is probably required.

Appendix C
********
POA Settings

The POA object has many settings that can be configured, several of these can increase or decrease the performance of the POA. These settings are available so that each administrator can
change them based upon their needs for optimal performance and stability.

1. Enable Automatic Database Recovery = On (default = On)
This setting enables the POA to detect database problems and fix them in most cases. In the long run, this will improve performance because it prevents problems from getting really back before
they are detected and corrected. But, it does take resources to run the GWChecks if and when problems are encountered. However, Novell recommends this be left On. The trade off in stability is
worth the possible loss in performance.

2. Enable Caching = On (default = On)
Having this set On improves the performance of the POA. It allows the POA, at the software level, to handle caching of the databases it is working with. Novell recommends that this setting be set to
On. However if there are problems with database corruption, it would be a good idea to turn this setting off until the source of the problem is located and fixed.

3. Enable SNMP = Off (default = On)
Only turn this off if SNMP is not being used to manage the Agent. This feature requires quite a bit of I/O and processor traffic. If SNMP is not being used (through Managewise or some other SNMP
manager), turning this unneeded feature off could help performance.

4. TCP Handler Threads = 20 - 30 users per thread (default = 6)
These are the TCP/IP threads that will handle the Client/Server requests. NCS has found that the appropriate setting depends on how active the users are. If the users are less active, only an
average of a 5 to 10 items per day both sent and received, then 30 users per thread is sufficient. However, if the users are more active (more than 30 items are sent and received each day, in
addition to Finds and busy searches), then 20 users per thread is recommended. It is important to adjust this number for each situation because each thread allocated take memory and resources.
On the other hand, if there are not enough threads allocated then there will be pending requests and that means slower performance for the end users.

5. QuickFinder Indexing = On (default = on)
The QuickFinder Indices are created for both libraries and users databases. They are used to speed up the query results for a Find as well as a document search. It is important to note that the
QuickFinder does take up a lot of resources on the server and often times will cause the utilization to peak until the Indexing is complete. With GroupWise 5.5 an additional setting was added to allow
this Indexing to start on a specific offset from midnight. This gives the administrator more control to run the QuickFinder during a servers slowest time.

6. QuickFinder Interval = ? (default = 4 hrs)
There is not a recommended number for this setting. It will vary based upon whether or not there is a library, how often or important is the Find feature to the users, and whether or not there is a
POA created to handle QuickFinder Indexing. If a Find (query) is made for a specific mailbox item, and that item has not been indexed, it will still be found but will take longer. If a query for a document
is made, and that document has not been indexed, then the document will not be found.

7. Max App Connections = 4 per user (default = 2048)
Application connections are virtual connections. They are the work horse for the IP traffic between client and POA. As new communication between client and POA is required, a new application
connection will be spawned. After 5 seconds of no use, the application connection will time out and terminate. An average user will use approximately 4 connections per session at any one time.
Each connection takes up about 8 K of memory. When application connections hit the maximum, the oldest connection is bumped to take care of the request of the new ones. If that old one was still
in use, the client will request a new one, thus causing a vicious circle. If users complain of speed, this setting may be too low.

8. Max Physical Connections = 1 per user (default 512)
There must be a physical connection created in order to generate application connections. A user can have multiple physical connections. In general one physical connection per user is sufficient
because not all users are going to active at one time. If GroupWise hits the maximum physical connections, the user will receive an error that they can not connect to GroupWise at that time.
Increasing the maximum connections for physical as well as application does not pre-allocate memory. The settings are there to protect GroupWise from accessing to many resources on the server.

9. CPU Utilization = 85% (default = 85%)
10. Delay Time = 100 millisec (default = 100 millisec)
These two setting go together and the defaults are the recommendations. These settings are designed to keep peak performance on the server. If the processing load on the server is to heavy, the
POA will start to delay the launch of new threads for 100 millisec. This allows the server to continue processing the current requests and still refrain from ignoring other responsibilities. The POA is
also designed to load balance its requests as the threshold is approached: Client/Server threads become the highest priority. The POA will start to terminate other threads, such as GWCheck and
QuickFinder, to free up resources for the client requests.

11. Another thing that can be done to improve performance on the server is to Flag the WPCSIN and WPCSOUT directories (and their subdirectories) for Immediate Purge. These directories will exist
below each of the Post Offices, Domains, and Gateways in the GroupWise System. In addition, each MTA will have a MSLOCAL directory structure that should be flagged as Immediate Purge as
well. For more information on this see TID# 2920356.

These directories have a lot of files written and deleted from them. Immediate Purge will help keep the volumes clean. If the administrator is running suballocation on the volume, the directory should
have at least 30% disk space available at all times. This implies non-purge able blocks. If the space is free but resides as "purge able blocks", utilization will be affected dramatically. By setting
Immediate Purge on high traffic directories, the cleanup tasks will not be left to the servers purge able blocks algorithm.

12. Finally, there is a TID that discusses POA log file interpretation. This TID provide ideas on how to read the POA log file to help optimize these settings. TID# 2939577.

10016883  (Formerly TID 2946201).

document

Document Title: GW Sizing Recommendations (scalability and tuning)
Document ID: 10016883
Solution ID: 1.0.16194835.2325036
Creation Date: 17Sep1999
Modified Date: 20Aug2003
Novell Product Class:Groupware
NetWare

disclaimer

The Origin of this information may be internal or external to Novell. Novell makes all reasonable efforts to verify this information. However, the information provided in this document is for your information only. Novell makes no explicit or implied claims to the validity of this information.
Any trademarks referenced in this document are the property of their respective owners. Consult your product manuals for complete trademark information.