GroupWise QuickFinder Indexing

(Last modified: 19Dec2002)

This document (10066601) is provided subject to the disclaimer at the end of this document.

goal

GroupWise QuickFinder Indexing

fact

GroupWise 5.x, 6.x

note

Indexing

GroupWise will automatically index the property sheet information and the full-text of all documents stored in libraries, in addition to items found in users mailboxes.  We will now explain the indexing and searching processes, Post Office Agent (POA) issues, guidelines for optimizing this process, and various physical configuration options.

Some file formats are not indexed.  The indexer will open the file and look at the header to determine the file type.  Your basic desktop applications such as Word and WordPerfect have filters that we will apply before indexing.  This keeps us from indexing headers and other forms of data that would not be searched on.  If we can't find the file format we will determine whether or not it is an ASCII file.  Other file types such as exe and images are not indexed.


Files Relating to the Indexing Process

In order to understand the indexing process, the files involved should be explained.  QuickFinder indices are constructed for each library.  All files involved can be found in the main Post Office \GWDMS\LIB* directory, or it's subdirectories.

DMDD*.DB:        These files contain the property sheet information for each document, an indexing queue, and a record for each document that keeps track of which users have a reference to that document.  In order to increase document management capacity, the library is divided into ten partitions.  As such, there are ten DMDD*.DB files.

INDEX\*.IDX:        These files contain the actual QuickFinder indices.  This includes all words found in each field of the property sheets and full-text of documents, along with pointers to documents containing those words.  This is also divided into ten partitions.  However, during the indexing process, these files are backed up on a regular basis. This causes the actual number of *.IDX files to vary from time to time.  Generally speaking, the active number of *.IDX files at any one time is ten.

INDEX\*.INC:        *.INC files are essentially the same format as the *.IDX files.  During the indexing process, the POA indexes documents at regular intervals.  Rather than compressing all new information into each *.IDX (index) file, this information is written to a *.INC (incremental) file.  During the next indexing cycle, each *.INC file is backed up and new information is appended to the original file.  Every night at midnight, the *.INC files are compressed into the *.IDX files.  Note: Due to the backup *.INC files, the compression does not necessarily remove all *.INC files from the INDEX directory.  The oldest (and no longer in use) *.INC files are automatically deleted.

DOCS\*.*:        Documents and their corresponding "wordlist" files reside as BLOBs (Binary Large OBjects) in this directory structure.  More information on "wordlist" files is contained in the next section.


The Indexing Process

All information pertaining to documents in a GroupWise library is completely indexed.  This information includes; property sheet information (subject, author, version descriptions, custom fields, etc.) and the actual full-text of documents.  All indexing is performed by the Post Office Agent (POA) running on either NetWare, or Windows NT.  The following steps describe the indexing process:

1.        When a document is created or imported into a GroupWise library, pointers to the property sheet and document are placed in an indexing queue. The queue itself is stored within the DMDD*.DB files found in the main library directory.  

2.        The POA will check this queue at regular intervals and index all items found.  The actual indexing process for new documents involves physically reading through each document, creating an index for each word found, and generating a "wordlist" file.  Note: When reading in the document for indexing, the POA does not decrypt or decompress the document. The BLOB file is simply read piece by piece into server cache where the POA can "look into" the BLOB for indexing.  There is no temporary directory used by the POA for the reading of documents.

3.        The index for each library is found in the PO\GWDMS\LIB*\INDEX subdirectory.  Within this directory there are *.IDX files, and *.INC files.  These files are explained in greater detail in the section entitled "Files Relating to the Indexing Process".

4.        The wordlist file simply contains a list of all words found in the indexed document.  It is stored in the external BLOB storage area with the document itself.

5.        There is a record in the DMDD*.DB files which is responsible for maintaining pointers to mailboxes which contain references to specific documents. A copy of the wordlist is replicated to all mailboxes found in that record.  This is primarily for full-text searching of documents in your mailbox without connecting to the library in which the document belongs.

6.        After modifying a document, a pointer is again placed in the indexing queue.  

7.        During the next indexing cycle, the POA will locate the document and it's corresponding wordlist file and reindex the document.  The QuickFinder technology will also remove all indices to words which have been removed from the document.

8.        The original wordlist file is deleted and a new file is created.

9.        This new wordlist is then passed to all mailboxes which contain a reference to the document.  The update process occurs behind the scenes and does not require any user intervention.

10.        After completing the indexing cycle, the POA will wait the specified interval and begin the process again.


Post Office Agent Administration

Prior to discussing proper POA configuration in an indexing environment, the recommended method of POA administration must be explained.  Post Office Agents can be configured via a startup file, startup switches, or from within NWAdmin.  With a single POA, using a startup file for configuration is a very simple and quick way to invoke the POA using the necessary settings.  However, in a multiple POA environment, using startup switches and files can require a large amount of administration.  Using the POA objects in NWAdmin, centralized administration across multiple servers is possible and will lower the cost of ownership.  Also, in order to utilize the ability to schedule tasks for the POA, administration through NWAdmin is required.  The following steps will demonstrate the proper procedure for setting up of single or multiple POAs in NWAdmin:

1.        From within NWAdmin, select the Post Office to which the POAs belong.

2.        Select the File|Create menu item, or right-click and choose "Create".  

3.        Select "GroupWise Agent".

4.        Enter an appropriate name for the new agent.  Note: For ease of administration, it is recommended that this name reflects the type of agent being created.  For example, when creating a POA dedicated to indexing, an appropriate name may be "IndexPOA".

5.        IMPORTANT: Select the agent type in the drop down box.  Note: In order to create a Post Office level ADA or MTA, the appropriate type must be selected in the drop down box.  For example, selecting a type of "Post Office" does not make a Post Office level configuration for an ADA if you named the object "ADA" in step 4.

6.        On the command line of the appropriate POA, use the "/NAME" switch and specify the name used in step 4.  The POA will use the configuration for the appropriate POA as created in NWAdmin.  Note: If two or more agents exist in NWAdmin, the agent will produce a message indicating that the "/NAME" switch is required so the appropriate configuration is used.


Post Office Agent Settings

There is one main configuration option available for the Post Office Agent with respect to indexing.  The "QuickFinder Interval" setting, found on the "Agent Settings" tab of the POA object in NWAdmin, or the "/QFINTERVAL" startup switch is used to specify the number of hours between indexing cycles of the POA.  This setting is initially defaulted to four hours through the NDS object of the POA.  If the value is set to zero, the POA will continually repeat the indexing cycle.  One additional setting is the /NOQF switch.  This setting disables the QuickFinder process for the POA.

In situations where relatively few documents are being created and managed by GroupWise 5, the default indexing interval of every four hours may be adequate.  However, if mass document creation and accurate up-to-the-minute searches are required, using a POA for dedicated indexing may be useful.  In order to facilitate this, a combination of the following configuration options are available:


NWAdmin Settings

QuickFinder Interval:        Specifies the number of hours between each indexing cycle. Note: In an early revision of GroupWise 5, the arrows beside this box could not be used to set this value to zero.  To accomplish this, simply select the number and type ยท0'.

Enable QuickFinder
Indexing:                Causes the POA to perform QuickFinder indexing.

Enable TCP/IP:                Allows the POA to process client TCP/IP requests.  Deselect this check box to prevent the POA from processing TCP/IP requests.

Message File Processing:        Determines which messages will be processed by the POA. Setting this value to "Off" prevents the POA from handling message flow.

Startup Switch/File Settings

/QFINTERVAL-0                Specifies the number of hours between each indexing cycle.  If you enter "0" it will index continually.  The interval is based on when indexing starts not after it finishes.  Therefore, if the interval is "1" and the indexer starts and runs for 55 minutes, it will then start again in 5 minutes.

/QFBASEOFFSET-0        The number of hours from midnight to start using the /qfinterval value.  The range for this switch is 0 - 23 and the default is 0.  The interval is in hours.  Use this setting to prevent the indexer from running during other critical processes such as backup.

/NOQF:           &nbs.p;    Disables QuickFinder indexing.

/NOTCPIP:        Prevents the POA from processing client TCP/IP requests.

/NOMF:                Prevents the POA from handling message flow for the Post Office.

These options are particularly useful when loading multiple POAs on multiple servers.  The indexing process can be processor intensive.  If a second server is available for use, a POA dedicated to indexing could be loaded with all of the above settings and it would simply index documents and user mailboxes.  Multiple POAs can be very useful and will be discussed further in the Optimization section of this document.


The Find (Search) Process

The GroupWise client has the ability to perform extremely complex searches on the information which the POA has indexed.  This feature is available under the Tools|Find menu item.  At this point, a user could simply enter a word to search on, or perform an advanced search and specify where to look for the word (such as the author or subject field).  From an engine standpoint, the search request process differs, depending on where the library is located and whether the client is accessing the document via drive mappings (file sharing) or client/server.  The possible scenarios will now be explained.

Find in Local Library - File Sharing
1.        User performs search.  This search request contains formatting information which determines how the search results will be displayed.

2.        The GroupWise client opens the index files located in the INDEX subdirectory beneath the PO and performs the search.

3.        The GroupWise client then displays the results to the user.

Find in Local Library - Client/Server
1.        User performs search.  This search request contains formatting information which determines how the search results will be displayed.

2.        The search request is submitted to the POA via TCP/IP packets.

3.        The POA opens the index files located in the INDEX subdirectory beneath the PO and performs the search.

4.        The search results are then passed back to the client via TCP/IP packets.

5.        The GroupWise client displays the results to the user.

Find in Secondary Library (library located on a different post office) - File Sharing or Client/Server
1.        User performs search.  This search request contains formatting information which determines how the search results will be displayed.

2.        The search is placed in the WPCSIN\0 directory.  This is the high-priority message queue of the MTA. (Note: The placement of the search request in the WPCSIN\0 directory may be performed by the client itself, through a store and forward process of the POA, or through a TCP/IP connection to the POA depending on your system settings.)

3.        The MTA scans the high-priority message queue in set intervals.  During the next scan cycle, the search request is detected and passed to the appropriate post office through the store and forward process of a high-priority message.

4.        The POA of the secondary post office which owns the library, scans the WPCSOUT\O directory in set intervals.  During the next scan cycle, the search request is detected.

5.        The POA opens the index files for the appropriate library and processes the search.

6.        The POA places the search results in the WPCSIN\0 directory.

7.        During the next scan cycle of the MTA, the search results are detected and passed back to the originator's post office through the store and forward process as a high priority message. The search results are placed in the WPCSOUT\0 directory.

8.        During the next scan cycle of the local POA, the search results are detected and passed to the user.  (Note: The passing of the search results to the user may be performed through a store and forward process of the POA, or through a TCP/IP connection from the POA to the client depending on your system settings.)

9.        The GroupWise client then displays the results to the user.

Note:        Searching a library in a secondary post office always goes through the store and forward process.  However, document retrieval from a library in a secondary post office is only available through TCP/IP via client/server.  This means users can search on secondary libraries if client/server is not being used, but documents will be inaccessible.  GroupWise Remote is the exception to the rule, as it always uses the store and forward process.  Please see the "Additional Issues" section for more information on GroupWise Remote.


Find in Multiple Libraries

1.        User performs search.  This search request contains formatting information which determines how the search results will be displayed.

2.        The local POA processes any search requests for local libraries.  The POA also passes search requests for secondary libraries to the MTA for process through the store and forward process of a high priority message.

3.        As search results are sent back to the local POA, the users search results dialog box will be updated.  The total number of results is updated and displayed in the lower right corner.


Find in Desktop

1.        User performs search.  This search request contains formatting information which determines how the search results will be displayed.

2.        If the client is accessing the post office in file sharing mode, the GroupWise client will spawn a QuickFinder searching process and search all items found in the user's mailbox.  This search initially opens the *.IDX and *.INC files found in an INDEX subdirectory of the user's mailbox. These index files contain a complete index of all items found in the users mailbox during the last QuickFinder indexing cycle of the POA.  If new items have been added to the mailbox since the last indexing cycle, the QuickFinder search will also perform a scan search of all new items.  If the client is accessing the post office in client/server mode, the POA will process the QuickFinder search in the same manner, and pass the results back to the user.


Optimization

Optimizing the GroupWise system for searching and indexing can greatly improve performance and usefulness of libraries in an organization.  However, a proper balance of needs and resources must be attained.  Determining an organization's needs and resources will greatly affect the actual implementation.  In order to address all issues, this section has been divided into three aspects of the optimization process.


Search Results Performance

By reviewing the searching process as explained in the previous section of this document, it is obvious that searching libraries on a remote post office is an intensive process.  Whether you are accessing your post office in file sharing or client/server mode, the search request must pass through a store and forward process to the secondary post office.  At that point, the search is performed and the results are passed back to the user via the same store and forward process.  In order to reduce the effects of this situation, a library should exist on each post office in which users create documents.

Generally speaking, users need frequent access to documents which pertain to their department or location. The users in these departments or locations are usually on the same post office.  If a library is created for each post office, document accessibility will be very fast for local users.  Of course, access to documents in secondary libraries is still possible - searching through the store and forward process and document retrieval through client/server.  However, most document activities will be in a local libraries and greater performance will be experienced.

There are some administrative advantages to having libraries only exist in a central post office.  Most of these issues are discussed in the document, "Planning a GroupWise 5 System with the Addition of Document Management Services."  However, from a performance standpoint, distributed libraries is the preferred configuration.


Single POA

In a GroupWise system in which there is one POA for each post office, index optimization must be based off of one decision. "How important are accurate, up-to-the-minute search results to each user?"  If document creation is primarily performed by a few people, chances are they do not perform much searching for documents.  Therefore, documents and the property sheets may only need to be indexed every few hours.  If this is the case, the QuickFinder Interval value should be set to an appropriate number such as the default of four.

However, if document creation is a major part of the corporation, up-to-the minute search results may be more important.  In this case, the QuickFinder Interval setting should be set to a lower value such as zero or one.  If setting this switch to zero places too much of a burden on your production server, refer to the next section dealing with multiple POAs.


Multiple POAs

Environments which produce many documents and rely on up-to-the-minute searching should consider using multiple POAs to handle document indexing, client/server access, and message flow.  In this environment, the POA on the production server could be used to handle client/server access to the post office and possibly message flow, while a second POA could be dedicated to indexing documents.

Having a dedicated POA is beneficial through offloading the indexing process from the production server to a second, less used server.  In this situation, the dedicated POA should be started with the QuickFinder Interval set to zero, Message Flow set to "None" and TCP/IP requests disabled.  The POA running on the production server should use the /NOQF setting to eliminate indexing on the main POA.

There are a couple key issues which should be taken under consideration when determining which machine should handle indexing and which should process client/server and message flow requests.  These considerations are as follows:

Server Utilization:        The indexing cycle is very processor intensive.  If your production server is currently under high utilization, the indexing POA should be loaded on a different server.

Network Traffic:        If a POA is loaded on a server other than the server which physically holds the post office, all requests of the POA must travel across the wire to the server which does hold the post office.  This will increase network traffic.  During the indexing cycle of a POA, this traffic could be very significant as the documents must travel to the indexing POA for the read process.  The indices must also travel back across the wire to be written to disk.

Given the above considerations, the ideal scenario for multiple POAs would involve a POA loaded on the production server for client/server and message flow processing.  A second POA could then be loaded on a second server for the indexing process.  In order to eliminate network traffic, this second server should be attached directly to the production server via a separate network segment.  This will allow for optimal performance of all POAs without hindering user access to the post office and libraries.


Index Rebuild

In the event the indices of the documents and property sheets no longer contain accurate information, a rebuild of the index files can be performed.  In order to activate this rebuild, while on the POA screen the following keystrokes must be entered: "Ctrl-Shift-Q".  This will cause the POA to remove all index information from the library's \INDEX\*.IDX files and queue all documents and property sheets for indexing.  This process starts immediately and will not bring up a verify operation dialog box.  Do not perform a "Ctrl-Shift-Q" on the POA if you do not intend to reindex everything (all library and user .db's).  The next time the POA begins the indexing cycle, all documents, property sheets and user databases will be reindexed.  Due to the nature of this process, this step should only be performed after consulting with Novell's GroupWare Technical Support Engineers.


GroupWise Remote

GroupWise Remote always accesses documents via the store and forward method.  If there is a situation where a direct TCP/IP connection to the remote POA cannot be established, the user can enter GroupWise Remote and be able to access the document..

document

Document Title: GroupWise QuickFinder Indexing
Document ID: 10066601
Solution ID: NOVL65467
Creation Date: 06Dec2001
Modified Date: 19Dec2002
Novell Product Class:Groupware
Management Products

disclaimer

The Origin of this information may be internal or external to Novell. Novell makes all reasonable efforts to verify this information. However, the information provided in this document is for your information only. Novell makes no explicit or implied claims to the validity of this information.
Any trademarks referenced in this document are the property of their respective owners. Consult your product manuals for complete trademark information.