NetWare 4 Compression: The Incredible Shrinking Data
Articles and Tips:
John Epeneter
01 Feb 1998
NetWare 4 compression allows you to manage files more efficiently and store more data without purchasing additional storage devices. Designed as part of a large file management system, NetWare 4 compression provides high performance by identifying the most frequently used files and storing them on the fastest media, identifying the less frequently used files and storing them on slower and usually cheaper media, and compressing files to make maximum use of all storage space.
Understanding how to use NetWare 4 compression helps you manage your company's network better. This article explains how NetWare 4 compression works, how to customize it, and how to avoid possible problems.
PROVIDING FAST FILE ACCESS AND DATA INTEGRITY
Because Novell designed the NetWare 4 compression algorithm for a server environment, this algorithm provides two important advantages for your company's network: The NetWare 4 compression algorithm delivers optimal compression and decompression times, and this algorithm maintains data integrity.
To give users fast access to files while providing optimum use of storage space, NetWare 4 compression runs during off-peak hours when the server has spare processor cycles. By default, NetWare 4 compression is configured to run from 12 a.m. to 6 a.m., thereby preventing any performance degradation that would occur if the server were compressing a large file during peak hours. The only times that NetWare 4 compresses files during peak hours is when a user requests that a file be compressed immediately or when a network administrator changes the default compression settings.
NetWare 4 compresses only files that are not frequently used, selecting files that have not been accessed for a specified period of time. In addition, the NetWare 4 compression algorithm provides a high compression ratio and low-latency decompression time. On a Pentium processor, NetWare 4 compresses about 10 KB of data per second during idle hours.
To ensure data integrity, NetWare 4 includes an integrity check with each block of compressed data. Before decompressing data, NetWare 4 first performs an integrity check, which can detect but not recover corrupted data. If the data has been corrupted, NetWare 4 notifies the user who is trying to access this data.
Each integrity check uses 4 bytes of hard drive space per 4,092 bytes of data, making a 4-KB block of data. During the compression process, NetWare 4 caches files in 4-KB blocks, regardless of the volume's block allocation size. The block allocation size is set in 4-KB increments, starting with one 4-KB block up to 16 4-KB blocks. Each file requires at least one 4-KB block, so if the volume's block allocation size were 64 KB (or 16 4-KB blocks) and a file were one byte, the file would use 64 KB of hard drive space.
If you enabled compression on a NetWare 4 server, you should also enable suballocation, another NetWare 4 feature that allows the server to store data in 512-KB blocks. If you did not enable suballocation and the volume used a large block allocation size, such as 64 KB, compressing a file might not save any hard drive space. In this case, the server would leave the file uncompressed.
COMPRESSING DATA
NetWare 4 begins the compression process by scanning the mounted volumes on which compression is enabled. NetWare 4 first scans the deleted files, which are stored as salvageable files, to determine which files can be compressed. NetWare 4 then scans the undeleted files to determine which files can be compressed.
NetWare 4 compresses files that meet the following criteria:
The file is not marked with the Don't Compress parameter.
The file has not been accessed for the amount of time specified by the Days Untouched Before Compression parameter. (The default setting is seven days.)
The file can be compressed the minimum percentage specified by the Minimum Compression Percentage Gain parameter. (The default setting is 2 percent.)
NetWare 4 immediately determines whether a file meets the first two criteria. However, NetWare 4 cannot determine whether a file meets the third criterion until NetWare 4 begins to compress the file by completing the following steps:
First, NetWare 4 scans the entire file to ensure the data is intact. NetWare 4 then builds a table of values in the file based on the number of times each value occurs in the file. This table is based on 100 percent of the file's data, giving NetWare 4 a good compression ratio. Most compression algorithms sample only the first portion of a file and assume that the rest of the file contains the same patterns. These compression algorithms often result in bit codes that are not optimized for the entire file, thus sacrificing a higher compression ratio for a faster compression time.
While scanning the file, NetWare 4 looks for duplicate strings, which can be up to 8 KB long, and encodes these duplicate strings. (Duplicate stringsare repeated data, such as a person's name in a business letter.) Because NetWare 4 uses 8-KB duplicate strings, it can compress 8 KB of data down to a few bits--another reason NetWare 4 has a better compression ratio than other compression algorithms.
After identifying duplicate strings, NetWare 4 processes the file's data, writing bit-map output to the compressed file. While producing this output, NetWare 4 ensures that the compressed file size is smaller than the original file size. If the compressed file is larger than the original file, NetWare 4 quits compressing the file and retains the original version of the file.
The degree to which a file can be compressed depends on the randomness of the file's data. The more random the data, the more difficult it is to store this data in a smaller format. Some files, such as ZIP, JPG, and GIF files, are already compressed and cannot be significantly compressed any further. Most files stored on servers can benefit from compression, including EXE and COM executable files, word-processing files, spreadsheet files, database files, text and log files, and user profiles. Many of these files, especially text files, can be compressed by as much as 90 percent. The maximum file size that NetWare 4 can compress is 256 MB.
If NetWare 4 fails to write a block of data to the hard drive or if the compression process is interrupted, NetWare 4 aborts the attempt to compress the file and rolls the file back to its original state. If the entire file is successfully compressed, NetWare 4 replaces the original file with the compressed file.
After a file is compressed, NetWare 4 generates a compression header and modifies the directory entry, showing that the data is compressed. The compression header, which is the first 32 bytes of the file, has its own integrity check that does not interfere with the first block's integrity check. This compression header contains information about the original file size, the compressed file size, and the type of compression used. This header also includes the correct counts for the bit codes used to decompress the file and other important information about the way the original file was laid out in the NetWare file system, including information about the "holes," or sparse areas of the file.
NetWare 4 records the file access times, the compression flag information, the date and time the file was compressed, and the location of the compressed data in the directory entry. NetWare 4 then releases the hard drive space used by the original file as free blocks.
NetWare 4 continues scanning each file until all files are scanned or the preset Stop time (which is 6 a.m. by default) has been reached. (See the "Compression Daily Check Stop Hour" section on p. 38.) If NetWare 4 begins scanning a file before the Stop time, NetWare 4 completes the compression process even if this process continues past the Stop time. Therefore, you should set the Stop time conservatively so that compression does not continue into peak hours.
NetWare 4 compression does not interfere with Novell Directory Services (NDS). All NDS data is marked with the Don't Compress parameter.
DECOMPRESSING DATA
When a user accesses a compressed file, NetWare 4 begins the decompression process by reading the compression header and checking its integrity. If this compression header has been corrupted, a message appears on the server console, listing the filename and the station number that was accessing the file and explaining that the file is corrupted. If the compression header is intact, NetWare 4 allocates the hard drive space to store the decompressed file.
If the volume doesn't have enough space for the decompressed file, an "out of space" error occurs. You can use several SET parameters to ensure the volume has enough space to hold decompressed files.
For example, the Decompress Percent Disk Space Free to Allow Commit SET parameter specifies the point at which NetWare 4 stops committing files to low hard drive space. This SET parameter prevents newly decompressed files from filling up the volume.
By default, 10 percent of each volume is free for storing decompressed files. Depending on the size of the volume, however, you may decide that 10 percent is too much or too little. You can adjust the free space to between 0 and 75 percent.
When a volume runs out of free space, a warning message appears at the server console and the attached workstations. If the message appears frequently, you can reset the Decompress Free Space Warning Interval SET parameter to between 0 and 29 days. Zero turns the message off; 29 sets the message to appear once every 29 days. The default setting is 31 minutes, 18.5 seconds. We recommend 12 hours as a comfortable setting.
If the volume has sufficient space, NetWare 4 reads the first block of the file into memory and checks its integrity. If the first block is corrupted, a message with the station number requesting this block appears at the server console. If the block passes both integrity checks, NetWare 4 calculates the number of blocks required and requests these blocks from the NetWare file system.
The amount of RAM required to decompress a file is the amount necessary to hold the tables that are used to map the compressed codes to decompressed byte values. NetWare 4 must build three tables at a maximum of 3,072 bytes each before interpreting the compressed file. NetWare 4 stores three tables per file, requiring 9,216 bytes of RAM per file. The number of files that NetWare 4 decompresses simultaneously determines the amount of RAM required for the decompression process. If memory allocation fails, the user or the application requesting access to the compressed file receives one of the following errors: ERR_NO_ALLOC_SPACE or ERR_OUT_OF_MEMORY.
With sufficient RAM allocated, NetWare 4 begins mapping the compressed codes to the decompressed byte values and writing duplicate string identifiers. This portion of the decompression process is handled by a separate "work-to-do" thread, which repeats the procedure of reading a block of compressed data, converting the data to its original form, and writing the original form to the decompressed file. If NetWare 4 finds a corrupted block of compressed data during the decompression process, an error appears at the server console showing that the compressed file is corrupted but not showing the station numbers.
Users who request a compressed file do not have to wait for the entire file to be decompressed to receive the first byte of data. During the decompression process, NetWare 4 immediately services any NetWare Core Protocol (NCP) requests for file reads. If the requested data is available in an uncompressed form, NetWare 4 immediately services the read request.
After the entire file is decompressed, NetWare 4 checks the last access date. If the file has been accessed twice within a specified period, NetWare 4 saves the decompressed version of this file on the volume. Otherwise, NetWare 4 returns the decompressed file to the volume's free blocks and retains the compressed version of this file. In either case, the last access date is updated for future reference.
Because you can copy compressed files more quickly than uncompressed files, you may want to manage files in a compressed form. You can use the NCOPY /R command to copy compressed files. Copying compressed files is quicker than copying uncompressed files and reduces network traffic.
SPEEDING UP FILE ACCESS
Novell designed NetWare 4 compression to make compressing and decompressing files fast, easy, and flexible. Users can access compressed files quickly, compress files immediately, avoid compressing selected files, copy compressed files, and view information about compressed files.
Thanks to the design of NetWare 4 compression, users are seldom aware that the files they are accessing are compressed. However, users may experience a minor delay the first two times a compressed file is accessed. After a compressed file is accessed twice within a specified period of time, NetWare 4 stores the file in its decompressed form. If a user modifies a compressed file, the write request waits while NetWare 4 decompresses the file before writing to this file.
You can save hard drive space by compressing files at the earliest possible time. You simply use the FLAG /IC command to mark individual files or directories for compression. If you mark a directory in this way, all of the files users save to that directory will be compressed.
You can also prevent NetWare 4 from compressing files in selected directories by marking these directories with the FLAG /DC command. Most files stored on a server are accessed occasionally but not frequently enough to avoid compression. As a result, these files may be compressed and decompressed repeatedly. To eliminate the minor delays this compression and decompression process can cause, you can instruct users to save these files in a directory marked with the FLAG /DC command.
You can use the NCOPY command to copy compressed files among various storage media. For example, you can use the NCOPY /R command to copy compressed files between storage media that support compression. If you want to copy a compressed file to storage media that do not support compression, you can use the NCOPY/RU command. You must then use this command if you want to copy the compressed files back to a NetWare 4 server; otherwise, NetWare 4 marks the files as uncompressed, and users who access these files receive invalid data.
MANAGING NETWARE 4 COMPRESSION
The first step in managing NetWare 4 compression is viewing information about compressed files. To view internal volume block statistics, you can use the NDIR /VOL command. To view information about compressed files, you can use other NDIR commands. For example, the NDIR /COMP command displays a table showing original file size, compressed file size, the time of the last update, and the number of blocks that were saved. (See "Viewing Information About Compressed Files".)
Novell's NetWare Administrator (NWADMIN) utility also provides information about NetWare 4 compression. In the NWADMIN utility, you highlight a Volume object, select Details from the Object menu, and click the Statistics button. A statistics page appears, indicating whether compression is enabled for the volume and, if so, what the volume's average rate of compression is.
However, the NWADMIN utility shows only that a file is flagged for compression. The NWADMIN utility does not show other compression information and reports only the original file size, rather than the compressed file size.
Third-party utilities also enable you to monitor NetWare 4 compression. For example, the CompMon utility from Midnight Technologies Inc. reports statistics about the performance of the compression and decompression process.
To determine if compression is enabled on a volume, you use the INSTALL utility at the server console. Under Volume Operations, press Enter for each volume to see whether compression is enabled.
Once enabled, compression cannot be disabled using the INSTALL utility. Disabling compression requires setting global parameters at the server console. The best place to set these parameters is in the Server Parameters, File System Parameters section of Novell's SERVMAN utility.
You can use the following parameters to manage NetWare 4 compression, and you can modify these parameters using the SERVMAN utility:
Enable File Compression
This parameter allows compression to occur on all compression-enabled volumes. If this parameter is set to Off, NetWare 4 queues requests for immediate compression until this parameter is set to On. The default setting is On.
Compression Daily Check Start Hour
This parameter specifies the time at which the compression process begins each day. The range is 0 to 23; the default setting is 0.
Compression Daily Check Stop Hour
This parameter specifies the time at which the compression process ends each day. If the Compression Daily Check Stop Hour parameter is equal to the Compression Daily Check Start Hour parameter, NetWare 4 ends the compression process when compression is completed. The range is 0 to 23; the default setting is 6.
Minimum Compression Percentage Gain
This parameter specifies the minimum percentage a file must compress to qualify for compression. The range is 0 to 50; the default setting is 2.
Maximum Concurrent Compressions
This parameter specifies the number of simultaneous compressions that can occur. (Simultaneous compressions can occur only on multiple volumes.) The range is 1 to 8; the default setting is 2.
Deleted Files Compression
This parameter specifies how NetWare 4 compresses deleted files. The range is 0 to 2; the default setting is 1. If you use the 0 setting, NetWare 4 does not compress deleted files. If you use the 1 setting, NetWare 4 compresses deleted files the next day. If you use the 2 setting, NetWare 4 compresses deleted files immediately.
Days Untouched Before Compression
This parameter specifies when NetWare 4 compresses files after the last access. The range is 0 to 100,000; the default setting is 7.
Convert Compressed to Uncompressed
This parameter specifies how NetWare 4 saves a file. The range is 0 to 2; the default setting is 1. If you use the 0 setting, NetWare 4 always saves the compressed version of the file. If you use the 1 setting, NetWare 4 saves the compressed version if the file is read only once within the time specified by the Days Untouched Before Compression parameter; NetWare 4 saves the decompressed version if the file is read more than once. If you use the 2 setting, NetWare 4 always saves the decompressed version of the file.
Decompress Percent Disk Space Free to Allow Commit
This parameter specifies the percentage of hard drive space that must be free on a volume before NetWare 4 can save a compressed file in its decompressed form. This parameter prevents newly decompressed files from filling the volume. The range is 0 to 75; the default setting is 10.
Decompress Free Space Warning Interval
This parameter specifies the amount of time between warnings indicating that NetWare 4 cannot change compressed files to decompressed files due to insufficient hard drive space. The range is 0 seconds to 29 days, 15 hours, 50 minutes, 3.8 seconds; the default setting is 31 minutes, 18.5 seconds. Setting this parameter to 0 disables the warnings.
RECOVERING CORRUPTED FILES
Just as some people are afraid to fly, even though flying is safer than driving, some network administrators are afraid of NetWare 4 compression, even though it is safer for data. When you open an uncompressed file, NetWare 4 retrieves this file's data without checking to ensure that the data has not been corrupted during storage. Disk drivers and controllers check for corruption, but if the data is already corrupted when it reaches the disk drivers and controllers, they return the corrupted data. Although NetWare 4 utilities, such as the VREPAIR utility, can correct problems in the NetWare 4 file system, these utilities cannot repair a file's data.
NetWare 4 compression, on the other hand, checks the integrity of each block of data before requesting that the data be written to the volume. If this data is corrupted, NetWare 4 notifies the user who requested the data.
Many conditions can cause data to become corrupted, including bad hard drive sectors, server crashes, and backup restoration errors. Any of these conditions can permanently corrupt data, and corruption can occur anywhere in a file, from the first block of data to the last block.
Detecting corruption in compressed files is especially important because compressed files include many dependencies. Since a compressed file is a bit map of data, NetWare 4 interprets a compressed file one bit at a time during the decompression process. Each bit indicates a state transition from the previous bit. If one bit changes, the entire direction of the decompression process changes.
Because the bits in a compressed file indicate values or duplicate strings, corruption can propagate itself throughout the decompressed version of the file. For example, if a bit indicating a duplicate string were corrupted, the length of the file and all subsequent strings would increase by one. As a result, the position for all of the following duplicate strings would be incorrect.
Several options exist for recovering data from corrupted, compressed files. Most network administrators rely on tape backups to recover data. Tape backups work well if you monitor each backup session and if the backup software warns you of corrupted files. For example, Cheyenne Software Inc.'s ARCserve 6 makes a note in the log file if a compressed file is corrupted. You should immediately restore the original file from a tape session that occurred just prior to the warning.
You may not notice such warnings, however, and even if you do, you must then navigate through all of the backup sessions to find the most recent uncorrupted file. Not only does this process take time, but it also does not restore any changes made since the last backup session occurred.
Third-party utilities can help you recover data from corrupted, compressed files. For example, the FixCFile utility from Midnight Technologies Inc. analyzes a file's bit pattern and finds the corrupted block of data. This utility then shortens the file to the block of data preceding the corrupted block and recalculates the internal compression structures to reflect the correct compressed and decompressed size. A user can then open this file, which is successfully decompressed to the state it was in just prior to the corruption.
BACKING UP COMPRESSED FILES
Backing up compressed files can be troublesome if you use outdated backup software. Good backup software detects when NetWare 4 compression is en-abled on a volume. This backup software should then back up the compressed file and reset the last access date on this file, thus preventing the file from being decompressed when a user accesses it the next day. The backup software should also detect when it is backing up a file that is already compressed.
Poor backup software forces NetWare 4 to a high server utilization because this software causes compressed files to decompress during the backup process. If you run nightly backups, the backup software prevents files from being compressed. If the NDIR /VOL command displays a low number of compressed files, the backup software may be decompressing these files or accessing them so often that the compression process can never occur. Third-party utilities, such as the CompMon utility, can also display information about whether a large number of files are being decompressed during a backup session.
Novell maintains a list of tested and approved backup software for NetWare 4. You should check with each backup vendor to determine whether its back-up software offers advanced compres-sion features.
OTHER CONSIDERATIONS
The following factors also affect NetWare 4 compression:
Excessive Compression
If a user has been restricted to using a particular amount of hard drive space, the user can optimize this space by flagging some or all files for immediate compression. However, compressing files during working hours uses more processor cycles, thus degrading performance for every user.
Flagging log files for immediate compression can also cause excessive compression. Each time an application writes to a log file, which is often large, the file must be decompressed and compressed.
In addition, the Convert Compress to Uncompress SET parameter can cause excessive compression. If you select the 0 setting for this parameter, all data on the server is decompressed and compressed each time users access this data. If you select the 2 setting for the Deleted Files Compression SET parameter, deleted files are compressed immediately.
Large Compressed Files
If you are low on hard drive space, large compressed files can create problems. For example, suppose that you had a 100 MB file on a 500 MB volume. Further suppose that NetWare 4 compressed the file to 10 MB and that the volume then filled until the default 10 percent of hard drive space, or 50 MB, remained free.
If a user tried to access the compressed file, the volume would not have enough space to save this file to the hard drive. Because NetWare 4 could decompress the file in memory, the user would still receive the decompressed file. However, an error message would appear at the server console. A good rule of thumb is to maintain enough free space to decompress the largest file on the volume.
In addition, a large compressed file could degrade performance if NetWare 4 started to decompress the file just before the Stop time. NetWare 4 would continue to decompress this file when users were logging in to the network.
Simultaneous Decompressions
If the settable compression parameters allow, the number of simultaneous decompressions that occur on a server can match the number of users on the server. However, using a high setting for these parameters can degrade performance because each user receives only a small portion of decompressed data at a time. Although using a low setting results in a longer queue of users waiting for decompressed data, they receive this data more quickly.
Bit-Map Files
Bit-map files, which are usually graphics files, are especially susceptible to corruption because these files are parsed one bit at a time. The compressed version of a bit-map file is a series of bits; the file's data depends on every previous bit being accurate, and one bad bit affects the way the entire bit-map image is interpreted. With other file types, however, a single bad bit affects only a portion of the file.
Virus-Scanning Software
Performance problems can occur if you use virus-scanning software with compressed files. Because virus-scanning software looks for patterns in a file, this software works only with uncompressed files. NetWare 4 must decompress and compress every file that is scanned for viruses. We are unaware of any virus-scanning software that does not decompress compressed files.
CONCLUSION
Because the amount of data that users store on today's networks is growing rapidly, hard drive space is at a premium. You can save an average of 50 percent of hard drive space when you take advantage of NetWare 4 compression. By learning how to use NetWare 4 compression, you can make optimal use of hard drive space.
Kyle Unice is president of Midnight Technologies Inc. (http://www.midnighttech.com). John Epeneter is director of Technical Services for Vinca Corp. (http://www.vinca.com). You can reach Kyle at kyle@midnighttech.com and John at john.epeneter@vinca.com.
Viewing Information About Compressed Files
You can use the following commands to access information about compressed files:
NDIR /COMP |
Shows all files in the current directory with their compression statistics |
NDIR /VOL |
Shows summary information about the volume, including compression statistics |
NDIR /Co |
Shows only currently compressed files |
NDIR /Cc |
Shows only files that cannot be compressed |
NDIR /Ic |
Shows files/directories marked Immediate Compress |
NDIR /Dc |
Shows files/directories marked Don't Compress |
NDIR VOL1:*.* /S /Co /COMP |
Shows compressed files in the volume, including subdirectories, and compression statistics |
* Originally published in Novell Connection Magazine
Disclaimer
The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.