Compression and Suballocation in NetWare 4
Articles and Tips: article
Corporate Integration Specialist
Systems Engineering Division
01 Jun 1994
It often seems that no amount of server disk space is sufficient for the storage demands of today's networks. Dramatic increases in the size of both applications and data files have taken their toll on disk storage resources. NetWare 4 offers several features to help alleviate this problem. This AppNote disccuses two of these featuresCfile compression and disk block suballocation. It explains how they work and presents some test results to give you an idea of how much money you can expect to save by using these features on your NetWare 4 servers.
- Introduction
- Compression: An Overview
- NetWare 4 Server Compression
- Disk Block Suballocation
- Compression and Suballocation: Determining Your Savings
- Summary
Introduction
With the release of NetWare 4, Novell has introduced three new server technologies to more effectively utilize disk storage on your NetWare server:
File compression
Disk block suballocation
Data migration
This AppNote explains the internal workings of the first two methods and discusses the impact of using these technologies on your servers.
Compression: An Overview
Even though the price of disk storage has dramatically declined over the past 10 years, the need for data compression has risen due to two main factors.
First, today's applications require more storage space than in the past. Remember the days when an entire application would fit on one floppy disk? Now some vendors (including Novell) are shipping their software on CD-ROMs for easier and faster installation.
The second reason that compression is needed today is for efficiency. It's doubtful that most of us really use all data that's available to us on file servers or on our local hard disks. So rather than having data always expanded and taking up valuable disk storage, it makes sense to have infrequently-used data be readily available, but in a compressed format.
Also, some file types are inefficient. For example, a simple, black-and-white Windows Paint file (BMP) I created took up 812,086 bytes of disk space. When compressed, the size went down to 1,948 bytes.
How Compression Works
The goal of compression algorithms is to rearrange or encode data in such a way that the resulting data is a fraction of the original data's size.
A common data compression method, called the duplicate string encoding algorithm, analyzes data for redundant patterns and then asssigns a code for each pattern. Depending on the implementation, the process might search the data by bits, bytes, or double-bytes. The resulting compressed file is actually an encoded file of nearly random data that has a list of patterns which, when pieced back together, form the original file. Data that is nearly random usually cannot be further compressed.
To illustrate how duplicate string encoding compression works, let's use the following famous quote and see how the resulting compressed file might look. Remember that this is a very basic example - real compression algorithms are much more complex.
"Ask not what your country can do for you; ask only what you can do for your country." (84 bytes)
Obviously, most files don't contain such repetative data, but using a basic encoding scheme might result in the following header for the coded file (the underscore [_] represents a space):
1=sk 2=_not 3=_what 4=_you 5=_country 6=_only 7=_can 8=_do 9=_for
The resulting data within the coded file would be:
A1234r56789; a16347894r5. (25 bytes)
So the size of the resulting encoded file would be 25 bytes plus the header (decoding) information. Compressing small files usually results in negligable savings due to header overhead. For example, using PKZIP on the 84-byte file above resulted in a 176-byte file - 92 bytes larger than the original file. For this reason, NetWare 4 only compresses files larger than 512 bytes.
Better compression ratios are the result of finding as many patterns as possible. This may mean keeping extensive pointers to data already processed (these are known as "back pointers"). The way in which patterns are discovered directly affects the time it takes to process the data. Therefore, a happy medium must be found for background compression to take place on NetWare servers so that decompression times are quick while compressed size is small (see Figure 1).
There are many other compression algorithms in the computer world, and some work best on specific data types. For instance, certain algorithms are designed solely for video and sound; others are for scanned images. There also exists "lossy" compression routines which offer greater compression ratios but do so by losing extraneous data that won't be relevant when decompressed. These routines are only acceptable for video (where human interpretation makes up for lost information), pictures (where colors are reduced because the eye cannot discern them), and sound (when our ears will not discern background sounds). It's obvious that most computer data must be restored to their original state (called "lossless" compression).
Figure 1: NetWare server compression strikes a happy medium between high compression ratios and quick compression times.
Common Compression Implementations
Before we get into NetWare 4 server compression, let's briefly discuss some other ways that compression is used in today's PC environments.
File Compression. Anyone who has dialed into a bulletin board system or online service such as CompuServe has worked with compressed files. Utilities such as PKZIP (from PKWare, Inc.) and Stuffit (from Aladdin Systems, Inc.) are used to compress individual files to save on download time or to place more data onto a floppy disk. Some files, such as those in CompuServe's GIF format, are by definition always compressed and require specialized utilities to access the data.
File compression utilities work by reading each file and compressing it into a new file (usually called an archive file). They also have the option of creating a self-extracting archive (SEA), so that the decompression code is built into the file. SEAs can then be decompressed by people who don't own the compression utility. However, SEAs are applications that only operate in their native environment (DOS, Macintosh, or Unix).
File compression utilities are great for online systems and for sending files via e-mail. But they require a lot of user intervention and seldom operate automatically. Thus they are used only in specific situations.
Disk Compression. Recent versions of MS-DOS and Novell DOS include disk compression programs which go by such names as DiskDoubler and Stacker. (Other software companies also offer similar programs.) These programs work by creating a new disk volume (also called a virtual disk) on an existing drive, and then compressing and moving all data over to the new compressed volume. From then on, all data on the volume is compressed and decompressed in real-time; you don't get to choose which files get compressed and which ones don't.
Compression is performed as data is written to or read from the volume without user intervention. This method of transparent data compression is extremely easy to use and gives the user about 50 percent more storage than before, depending upon the data stored.
For more information on disk compression, see "Exploring Hard Disk Compression" in the August 1993 NetWare Application Notes. |
The main drawback to disk compression is that it slows down the system because every disk access requires a compression process to be executed. In recent industry tests of MS-DOS and PC-DOS disk compression, the results indicate that - depending upon the application - disk compression slowed performance from a little to over fifty percent (see InfoWorld, January 24, 1994.)
Compression in Communications. A more recent application of compression is in the area of data communications involving modems and LAN/WAN links. Here, both hardware and software work to increase throughput on various types of topologies and media.
The current wave of modems incorporate a chip that compresses data in real-time during an active communication session. Using these compression methods (which go by such names as MNP5 and V.42 bis), a 9600-baud link can send and receive data several times faster than without compression. However, each modem must use the same technology for compression to work.
For more information on compression over WANlinks, see "Optimizing NetWare Wide Area Networks" in the May 1994 Novell Application Notes. |
Another new trend is to enable software compression for remote communications in products such as NetWare Remote Node and NetWare Connect. Here, the remote user and local server use V.42 bis implemented at the network level to enable compression for all applications.
Compression is also being used by routers to increase throughput on a wide area network. For instance, with the NetWare Multiprotocol Router and appropriate hardware, network packets can be compressed.
NetWare 4 Server Compression
Novell engineers developed a new compression method specifically for NetWare 4. It allows the network manager to maximize disk storage using intelligent criteria. The other methods discussed above either work on specified files or on all files, but NetWare 4 allows you to better control the compression environment.
NetWare 4 server compression operates in two ways. The manual method is to mark a file or directory as "Immediate Compress" with the FLAG command (see section below). But the best way to handle file server compression is to let the server itself determine which files to compress and when. This is done by adjusting several SET commands on the server console.
NetWare compression works for all name spaces including Macintosh and Unix file systems, and therefore works seamlessly on all platforms. It is enabled by default when a NetWare volume is created, and can be enabled on an existing volume using the INSTALL utility. Keep in mind that once compression is installed on a volume, it cannot be removed. However the compression routines can be enabled and disabled using a SET parameter, as explained later.
Before we get into how server compression works, let me review the NetWare file attribute called "Last Accessed." Novell DOS and MS-DOS only keep track of when a file was last modified. The Macintosh file system also records file creation dates. But for intelligent file compression to occur, the operating system must keep track of the last time a file was accessed (read from or written to). With this piece of information, NetWare can compress files when they have not been touched after a specified length of time. (Unix also records last access time.)
How NetWare 4 Compression Works
In very simplified terms, the NetWare 4 compression algorithm works as described above: duplicate strings are encoded and the compressed file becomes a coded image of the original file.
Compression begins when one of the following conditions occurs:
The Immediate Compress (IC) bit is set on a file or directory.
A file has been deleted and the salvageable file system has been enabled for immediate compression.
The nightly search thread has discovered an untouched file or salvageable file and queued it for compression.
These three conditions are illustrated in Figure 2.
Figure 2: Events that cause files to be queued for compression.
As a file is queued for compression, the original file is analyzed and a temporary file is built with translation information. This temporary file is kept in the disk cache by the caching subsystem, as long as the temporary file doesn't consume more than half of the cache buffers.
For you techies out there, the compression thread runsat low priority. After processing 128 bytesof the original or temporary file, the threadreschedules itself with the RescheduleLastLowPriority call. |
Although the compression routines have been optimized for size, they are queued as secondary processes so that ordinary disk requests will be executed first.
After the temporary file has been built and the file size for the compressed version has been calculated, NetWare determines whether any disk sectors will be saved by having compressed the file. The value of the SET parameter "Minimum Compression Percentage Gain" (default is 2%) is compared to the calculated savings. If the savings is greater than the "Minimum Compression Percentage Gain," and no errors have occurred during the process, the compression thread will begin creating the compressed version of the file by processing the temporary file.
Only after the compressed file has been completely written will a controlled swap of the original and newly compressed file take place. If any errors - including power failure - occur during this process, the original file is left intact.
Compression routines are CPU intensive and thus are best left to execute during off-hours. This is why the default for the "Compression Daily Check Starting Hour" parameter is set to 0 (midnight) and the default for the "Compression Daily Check Stop Hour" parameter is set to 6 (6 a.m.). The related parameter is "Days Untouched Before Compression," which has a default of 7 days. Using these defaults, the server will scan the file system from 12 a.m. to 6 a.m. and look for files left unaccessed for at least seven days before it schedules compression on those files.
If a compressed file is accessed, one of two things happens after the file has been decompressed-depending on the setting of "Convert Compressed to Uncompressed Option" parameter:
If set to 0,the file will remain on the disk as compressed.
If set to 1, the file will remain compressed, provided that it is not again accessed before the time specified by the "DaysUntouchedBefore Compression" parameter.
For example, suppose that "Convert Compressed to Uncompressed Option" is set to 1 and "DaysUntouchedBefore Compression" is set to 5 days. If I access a compressed file on Monday, the file will be decompressed into memory and sent to me, but will remain compressed on the disk. If I accessthe file again on Thursday, it will be permanently decompressed to disk as it is being retrieved and sent to me. If I had waited until Saturday to access the file, the file would still remain compressed.
Another intelligent use of server compression is found with "Deleted Files Compression Option." This parameter determines when and if deleted files should be compressed. If set to 1, then the deleted files will be compressed the next day (between the start and stop compression hours). If set to 2, the files will be compressed immediately.
The chart in Figure 3 summarizes the compression options settable via the SET command on the server. If you need to change any of the defaults, put the appropriate commands in the server's AUTOEXEC.NCF file so they will take effect whenever the server is booted.
Figure 3: SET parameters for file compression on a NetWare 4 server.
SET Parameter
|
Explanation
|
Default
|
Notes
|
Enable File Compression |
Set to ON toallow compression on compression-enabledvolumes. When set to OFF, compression will pause. |
ON |
When OFF, filesflagged IC are queued until compression is allowed. |
Minimum Compression Percentage Gain |
If the compressedfile won't be this much smaller than theuncompressed size, it is not compressed. |
2 (%) |
|
Compression Daily Check Starting Hour |
Specifies whento start the search for files that have not been accessed. |
0 |
Hours are specifiedin military time (0=midnight). |
Compression Daily Check Stop Hour |
Specifies whento stop the search for files that have not been accessed. |
6 |
Hours are specifiedin military time (0=midnight). |
Days Untouched Before Compression |
Specifies howmany days a file must remain unaccessed beforeit can be queued for compression. |
7 |
|
Convert Compressedto Uncompressed Option |
Specifies howthe server stores a compressed file afteruncompressing it. 0=always leave the filecompressed; 1=leave it compressed after asingle access within the "untouched" period;2=always leave it uncompressed. |
1 |
|
Deleted FilesCompression Option |
Specifies howthe server handles unpurged deleted files.0=don't compress; 1=compress during the nextsearch interval; 2=compress immediately |
1 |
|
Maximum Concurrent Compressions |
Specifies howmany compression operations can be performedsimultaneously. Concurrent compressions canoccur only on multiple volumes. |
2 |
Increasingthis setting may slow server performance. |
UncompressPercent Disk Space Free to Allow Commit |
As a disk volumeruns out of space, a compressed file whichwill be permanently decompressed may requiretoo much disk storage. This parameter preventsnewly decompressed files from using too muchvaluable free space. |
10 (%) |
|
UncompressFree Space Warning Interval |
If files cannotbe decompressed due to lack of free diskspace, a warning is sent on the console.This parameter determines how often the warningis sent. |
31 min. 18.5 sec |
To preventunnecessary temporary decompressions dueto lack of disk space, the network administratorshould monitor the server's free disk spaceto ensure that it exceeds the "UncompressPercent Disk Space Free to Allow Commit" setting. |
File and Directory Attributes Related to Compression
The following attributes are set using the FLAG utility or from other NetWare utilities.
Immediate Compress (IC). Changeable by client for files and directories. When set for a file, indicates that the file should be queued for compression. When set for a directory, indicates that all files placed into the directory should be queued for compression unless the DC or CC attributes are set for the file (see below).
If a file is copied into a directory flagged as IC, the file's attributes won't change. If a compressed file in a IC-flagged directory is moved to another directory, the file will remain compressed.
Don't Compress (DC). Changeable by client for files and directories. This attribute indicates that the file or directory should never be compressed. If set on a file that is already compressed, then the file will remain compressed until accessed. So setting a compressed file to DC won't immediately decompress it, but setting an uncompressed file to IC will immediately compress it.
If set on a directory, then all files in that directory will not be compressed unless a file is flagged IC. A file's attributes will not be changed if it is copied into a DC-flagged directory.
Can't Compress (CC). Set by the NetWare file system and unchangeable by the client. Indicates that the server has tried unsuccessfully to compress the file. Once this bit is set, it will not be reset until the file is written to. This attribute indicates that the file is less than 512 bytes in length, that the file is already compressed by another compression algorithm, or that the data within the file is nearly random.
File Compressed (CO). Set by the NetWare file system and unchangeable by the client. Indicates that NetWare has successfully compressed the file.
How Decompression Happens on the Server
File decompression is initiated anytime a file is opened, except when it is to be specifically accessed in its compressed state (see "NetWare Compression Tips" below).
For techies: The decompression thread relinquishes the CPU when 256 bytes are read from the compressed file, and when 4KB of decompressed data has been generated. The decompression work-to-do threadand its associated read-ahead thread both run at normal priority. |
The server reads the compressed file in 4KB chunks. As each chunk is retrieved, it is verified and decompressed into memory. The decompression routine notifies NetWare when 4KB of data is ready to be transmitted to the client. This has the effect of sending data as quickly as possible rather than having the client wait for the entire file to be decompressed.
If a file is set for "Immediate Compress" and is opened for reads or writes, the file is decompressed to disk. After the file is closed, it is queued for immediate compression.
Note: Keep in mind that, when saving data, some applications create new files instead of updating existing files. In that situation, the new file will not have the IC attribute. Thus it is best to flag the directory, rather than individual files, to be compressed.
Compression Ratios
The following table shows typical compression ratios that can occur on a NetWare 4 server.
Figure 4: Typical NetWare 4 compression ratios.
File Type
|
OriginalSize (bytes)
|
CompressedSize (bytes)
|
CompressionRatio
|
Bitmap image (BMP) |
32,078 |
15,360 |
52% |
Windows DLL |
451,280 |
227,840 |
50% |
Text file |
43,524 |
16,384 |
62% |
Text file, table |
111,947 |
22,528 |
80% |
Windows executable (EXE) |
1,039,904 |
514,560 |
50% |
How does NetWare compression stack up against other popular file compression utilities? The following table gives you a general idea of how much the most popular utilities compressed the five various files listed above (BMP, DLL, text, text table, and Windows EXE).
Figure 5: Comparison of compression ratios of five file types by various utilities.
CompressionUtility
|
CompressionRatio (%)
|
NetWare 4 |
53% |
PKZIP 2.04g (DOS) |
56% |
Stuffit Deluxe 2.01 (Macintosh) |
51% |
Compress (Unix) |
40% |
The time it takes to complete any compression routine is directly dependent upon the speed of the CPU and whether the routine is performed in the foreground or background. Therefore, I have not included the time it took to complete the routine.
Note that while NetWare lets all other disk requests supercede files being compressed, it must be able to quickly decompress files in real-time while performing other network tasks and without user intervention. So NetWare reads compressed files faster and easier than any of these compression utilities.
NetWare Compression Tips
Following are some tips for effectively using NetWare 4 compression.
To copy compressed files in their compressed format, use NCOPY with the /Ror /RUparameters.
The /R parameter will keep files compressed only if the destination volume supports compression. This speeds up the copying of already compressed files.
The /RU parameter will keep a NetWare-compressed file in that state even if the destination volume (such as your local hard disk) doesn't support NetWare compression. Be aware that copying NetWare-compressed files to a non-NetWare volume means that the files cannot be decompressed unless they are first copied back to a NetWare volume which supports compression.
The NetWare 4 version of NDIR has a /COMPressed parameter that displays file compression Information. A sample screen is shown below:
Files = Files contained in this path Size = Number of bytes in the file Comp Size = Number of bytes in the compressed file Last Update = Date file was last updated Saved = Space saved by having the file compressed Other = Compression/Migration attributes and status TEST1/SYS:USERS\AMARK\*.* Files Size Comp Size Last Update Saved Other ---------- ------ --------- -------------- ------ --------- 01-ALAN.AI 46,160 14,848 4-07-94 12:52p 67.83% [----]Co- 02-ALAN.AI 27,849 9,216 4-07-94 12:59p 66.91% [----]Co- 03-ALAN.AI 27,574 9,216 4-07-94 12:43p 66.58% [----]Co- 04-ALAN.AI 71,915 17,920 4-07-94 2:03p 75.08% [----]Co- APPNOTE.WP 29,379 12,800 4-15-94 1:44p 56.43% [----]Co- MEMORY.DOC 9,859 4,096 3-29-94 10:01a 58.45% [----]Co- EXTRA.AI 25,171 8,192 4-07-94 12:12p 67.45% [----]Co- ... 674,656 bytes (950,272 bytes in 29 blocks allocated, not compressed) 266,240 bytes (950,272 bytes in 29 blocks allocated, compressed) 60.54% Space saved by file compression
The NDIR /VOL command also displays compression information, as shown below:
Statistics for fixed volume TEST1/APPS: Space statistics are in KB (1024 bytes). Total volume space: 512,000 100.00% Space used by 437 entries: 75,712 14.79% Deleted space not yet purgeable: 0 0.00% -------- -------- Space remaining on volume: 436,288 85.21% Space available to AMARK: 436,288 85.21% Maximum directory entries: 8,704 Available directory entries: 2,587 29.72% Space used if files were not compressed: 194,189 Space used by compressed files: 82,770 -------- Space saved by compressing files: 111,419 57.38% Uncompressed space used: 18,094
An undocumented compression screen can be enabled on the server. Although the screen is designed for debugging purposes, it nonetheless lets the administrator see which files are currently being compressed. The server console command is SET COMPRESS SCREEN=ON, which displays information similar to the following sample:
The screen shows (a) the file being processed (an asterisk means decompression); (b) the compression ratio; (c) bytes/second processed into the compression engine; (d) bytes/second processed out of the compression engine; (e) the file's compressed size; (f) the file's original size; and (g) debug information having to do with how the file was processed, whether or not it could be compressed, and other codes useful to Novell programmers.
There are significant issues concerning the backup of NetWare compressed files. If the backup program isn't NetWare 4-aware, each compressed file may be first decompressed before being sent to the backup device. If that same program then restores those files, they will be restored as uncompressed and require much more storage than before. Also, most backup programs restore the last accessed date to the original value.
If the backup program is NetWare 4-aware and uses Novell's Storage Management Services (SMS), then compressed files will be backed up and restored as compressed.
Disk Block Suballocation
Many of you may be familiar with the term "cluster size," which refers to the minimum file allocation unit for local hard disks. On NetWare servers, the similar term is "disk block size."
NetWare 4 allows disk block sizes to be set to 4KB, 8KB, 16KB, 32KB or 64KB. Using the 64KB block size results in very fast disk operations. The reason is simple: The larger the disk block, the more data which can be transferred to and from the disk in a single request. So a 100KB file using 64KB blocks can be read in two disk reads, whereas it takes 25 disk reads with a 4KB block size. Also, larger blocks require less memory and the read-ahead operation operates faster. (Read ahead is a background task that reads sequential files into cache in advance of the request.)
However, there is a major disadvantage of using a large disk block size: Since at least one disk block is allocated per file, files that don't completely fit into a disk block will leave much unused space. With 64KB blocks, that can be a lot of wasted storage.
For instance, suppose that the disk block size is 4KB (4,096 bytes). Any file that is not an exact multiple of 4KB will have unused space in the amount of:
File size MOD 4096
(The MOD function returns the remainder of the division of the two numbers). So, for example, a 3KB file will have 1KB wasted, and a 4097-byte file will have 4095 bytes wasted. This methodology can result in the inability to completely utilize disk space. In other words, even though storage space is available, it cannot be used.
Figure 6 shows how files are stored in a disk block. This scheme works the same for DOS, Macintosh, and Unix systems and for NetWare servers without suballocation.
Figure 6: Traditional file storage schemes waste disk space.
As shown in Figure 6, smaller files usually create the most wasted space as a percentage of their size. For instance, both 1KB and 5KB files have 3KB of wasted space, but the percentage is larger for the 1KB file (75% versus 38%).
Our discussion of suballocation will revolve around the following scenario: A NetWare 3.12 disk partition was divided into three volumes, each 23,000,000 bytes in size with a 64KB block size. (The same results would be reached if using NetWare 3.11.) The server was then upgraded to NetWare 4 and suballocation was enabled.
The Directory Entry Table and Its Role in Space Utilization
The maximum number of files which can be stored on a volume depends on the number of directory entries and blocks available. In NetWare 3.x and 4.x, the initial size of the directory entry table is determined according to the volume size, and then expands as needed. On our 23MB volumes, 512 directory entries were created.
Note: Remember that directory entries are used by subdirectories as well as files, and the directory entry table is itself a file.
Since each volume had a 64KB block size, the maximum number of disk blocks was 350, of which 347 were available (approximately 23MB / 64KB). (The precise calculation includes accounting for hidden files used by NetWare.)
Since each file requires at least one disk block, the most files that can be stored on a volume without suballocation is determined by the number of available blocks. Therefore, without suballocation, at most 347 files can be created on our 23MB volume. But with suballocation enabled, as we will see, significantly more files can be created because each file doesn't necessarily use one complete block.
Life Without Suballocation
How many files did it take to fill our NetWare 3.12 volumes?
On SYS2, I created a subdirectory and proceeded to fill it with 1KB files until the disk was filled. After the 347th file was created, a "Disk Full" message appeared on the screen.
On SYS3, I followed the same procedure, but this time with 32KB files. Once again, the 347th file caused a "Disk Full" error.
On SYS4, I created two directories and stored fifty 1KB files in \DIR1 and fifty 32KB files in \DIR2.
The results of these tests are shown graphically in Figure 7.
Figure 7: Amount of data that fit on the NetWare 3.12 volumes (without suballocation).
The numbers behind this graphic are given below. Notice the slack for each volume. Slack is the percentage of wasted space that could otherwise be used for storage. For SYS2, 98 percent of the disk space was wasted because each 1KB file took up the space of a 64KB file.
Volume
|
Block Size
|
Dir. Ent. Used
|
Data Stored
|
StorageAllocated
|
Wasted Storage
|
Slack
|
SYS2 |
64KB |
347 |
347.0KB |
23.0MB |
22.6MB |
98% |
SYS3 |
64KB |
347 |
11.3MB |
11.7MB |
11.7MB |
50% |
SYS4 |
64KB |
102 |
1.7MB |
6.6MB |
4.9MB |
74% |
How NetWare 4's Suballocation Works
NetWare 4's suballocation feature is designed to solve the dilemma of how to increase server performance by using large disk block sizes, but not be penalized by the wasted storage that large blocks cause. By dividing a block into 512-byte chucks, the most wasted space you'll ever have is 511 bytes-even with 64KB blocks.
The suballocation routines in NetWare 4 are quite complex, but neither users nor administrators need to worry about it. There are no utilities or console commands to contend with, except to merely enable suballocation from the INSTALL program.
Files are traditionally controlled using the File Allocation Table (FAT), and in all versions of NetWare the FAT is hashed and indexed for faster data retrieval. NetWare 4 adds suballocation. When suballocation is enabled on a volume, the server designates blocks for a specific range of file sizes or ending data fragments (that is, data that cannot nearly or completely fill a block). Each "Suballocation Reserved Block," or SRB, is specific to a narrow range of file sizes or ending data fragments based on multiples of 512 (1 to 512 bytes, 513 to 1024 bytes, etc.).
Figure 8 illustrates this concept.
Figure 8: Suballocation reserved blocks.
The number of SRBs is determined by the formula:
SRB = (block size/ 512) B 1
For a 64KB block size, at most 127 blocks are reserved. Files that are within 511 bytes of the block size (e.g. from 65,025 to 65,536 bytes for 64KB blocks) are not suballocated.
So for 64KB blocks, the SRB designated to hold 1KB files can hold up to 64 files or ending data fragments, while the 20KB SRB can hold just over 3 files or fragments.
Once an SRB is full, another block is dynamically created and chained for suballocation use. Hence there can be chains of blocks that are used specifically for suballocation. Also, a file or fragment will be stored in two SRBs if the data cannot be completely stored within one SRB. For example, after a 20KB SRB contains three files or fragments, 4KB of the fourth file is stored in the current SRB and the remaining 16KB is stored in another SRB. This method optimizes the use of suballocation reserved blocks.
How does a file get suballocated? First, the server computes the fragment data size with the formula:
Fragment data size = file size MOD block size
For example, with a 64KB block, a 65,885-byte file will have a fragment data size of 65,885 MOD 65,536 = 349. If the fragment is within 512 bytes of the block size, the file is not suballocated.
Second, the server determines if an SRB exists in the range of the fragmented data. If not, it reserves a block. If one does exist but is filled or will cause an overflow, then a new SRB is created and the chain is extended. For a 65,885-byte file, the SRB used is for range from 1 to 512.
Finally, if the file is larger than one disk block, it is written into two sections of the disk: the primary disk block and the SRB.
All files with fragments in the same range will have their ending data stored in the same SRB chain. So a 65,885-byte file and a 65,886-byte file will have their ending data stored in the same chain because their fragments (349 and 350 bytes, respectively) fall within the same 512-byte range.
As files are created, expanded or contracted, data may be moved from one SRB to another. This may leave holes within the SRB. Since SRBs take up disk space, they are periodically cleaned up by compacting the entire chain and releasing unused blocks for primary storage. For instance, if a file is originally 4KB, and is then opened and closed so that the new size is 7KB, the updated file will be stored in an SRB designated for 7KB entries, leaving an empty space in the old 4KB-designated SRB.
Test Results with Suballocation
Let's return to our test server scenario. I performed the upgrade to NetWare 4 from the LAN; the process took about 20 minutes. I enabled suballocation for each volume, and then logged into the server and analyzed the volumes.
Just as I expected - no extra space was available. "What?" you say, "Even with suballocation turned on?" The reason is as follows:
For existing volumes, suballocation only affects newly-created files.
This is an important point. If you upgrade existing NetWare 3.x volumes to NetWare 4, the benefits of suballocation won't be realized until existing files are opened or new files are written to the disk. Untouched files won't benefit.
Most NetWare 3.x volumes use small block sizes. Since suballocation works best for block sizes greater than 4KB, to fully get the benefits of large block sizes and suballocation, you should create new volumes with 64KB block sizes and suballocation enabled. So to implement this new technology on your newly upgraded NetWare 4 server, you'll have to create new NetWare volumes. There are two ways to do this.
For a detailed discussion of NetWare migration strategies,see "NetWare Migration Utilities Part 1: The In-Place Upgrade NLM" in the June 1993 issue and "NetWare Migration Utilities Part 2: The Across-the-WireMigration Utility" in the September 1993 NetWare ApplicationNotes. |
The first method is to use over-the-wire migration, where a NetWare 3.x server sends data to a new NetWare 4 server over a LAN connection. The new server should have compression and suballocation turned on before the migration process begins.
The second method is to back up the NetWare 3.x server (onto tape or external hard disk), upgrade the same server to NetWare 4, create new NetWare 4 volumes, and finally restore the data to the new volumes.
For my server, I simply deleted all files on the volumes and re-ran the test up to the maximums reached in the first test. The results, as shown below, demonstrate the significant storage savings using suballocation in a best case scenario(one with no slack).
Volume
|
No.of File
|
Dir. Ent. Used
|
DataStored
|
StorageAllocated
|
Wasted Storage
|
Slack
|
SYS2 |
347 |
347 |
347.0KB |
347.0KB |
0KB |
0% |
SYS3 |
347 |
347 |
11.3MB |
11.3MB |
0MB |
0% |
SYS4 |
100 |
102 |
1.7MB |
1.7MB |
0MB |
0% |
Of course, no real-life situation will give you 0 percent slack. But under normal use you can expect significant savings, especially on volumes which store many small files (even a 1-byte file is suballocated). As files are stored on a volume with a large disk block size, the more suballocation saves disk space. Also, suballocation works for files in other name spaces and on deleted files (deleted files continue to use disk space until purged).
To further test the effectiveness of suballocation, I copied a 19MB file to volume SYS2 and it fit! Remember that without suballocation, I couldn't copy any more files to SYS2.
Savings with Suballocation
How much more disk space can you expect with suballocation? I developed a spreadsheet to calculate how much suballocation saved us on our production server. The calculations are based on formulas developed by Novell engineers to determine server memory requirements (see Figure 9).
Figure 9: Spreadsheet for comparing cost of RAM and wasted disk space with and without suballocation.
Note: This spreadsheet and other server analysis utilities are available on NetWire, NOVLIB Library 11, in a self-extracting file named COMSUB.EXE. (See the Research Index for more information about downloading files from NetWire.)
Figure 10 charts the results for our 1GB disk, which had an average file size of 35KB. Not using suballocation cost us $736 in wasted storage and RAM. With suballocation enabled, the cost was only $119-a savings of $615. That's enough to buy a new 600MB hard disk!
Figure 10: Estimated costs of RAM and unused disk space with and without suballocation on a Novell production server.
Enabling suballocation and increasing the disk block size results in lowering the overall file system memory requirements (especially for the File Allocation Tables) and increases throughput. Depending on the average file size and disk block size, suballocation can save you a lot of money on hardware. Put another way, suballocation gives you more storage for your buck (or franc or mark or pound).
Compression and Suballocation: Determining Your Savings
The most accurate way of predicting how much storage will be saved by implementing compression and suballocation is to analyze each file and compute (1) its compression size, and (2) its utilization within a disk block.
An easier method, which is just an estimate, involves the following steps:
Compute the amount of wasted space on the volume with NDIR, SDIR, or some other utility such as File Size from Norton Utilities (Symantec Corporation). To do this, subtract the total allocated space from the actual space used.
Using NDIR /C /UN on a sample NetWare 3.12 volume with 32KB disk blocks, the results were "157,857,716 bytes in 4,180 files; 181,796,864 bytes in 5,548 blocks." So the wasted space was 181,796,864 - 157,857,716 = 23,939,148 bytes.
Now determine how much space will be saved by using suballocation and the existing block size. (If you will be increasing the block size, your savings will be even greater.) Using a worst-case scenario of 511 bytes of space wasted for each file, multiply the number of files by 511 and subtract from the wasted space (21,803,168 - 4,180 x 511 = 21,803,168).
Estimate the total space taken by executables (EXE, DLL) and multiply by .45 (assuming an average compression ratio of 45%). From the root of a volume, use the command NDIR*.EXE,*.HLP /SUB /C /UN and use the total.
Estimate the total space taken by all other files and multiply by .6 (assuming an average compression ratio of 60% for non-executable files).
Add the results from steps 2, 3, and 4 to estimate your total savings.
Summary
NetWare 4 provides new ways to effectively get more storage from your disks. By enabling compression and suballocation, data storage is better utilized. Infrequently-used and recently-deleted files no longer take up valuable disk real estate because they are compressed and suballocated. And those small batch files and documents no longer take up an entire disk block.
Perhaps the best feature of these technologies is that the user is shielded from the complexities of the system. And administrators can choose to accept the default server configuration and still make use of compression and suballocation.
* Originally published in Novell AppNotes
Disclaimer
The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.