We are redesigning an application that currently stores...

Articles and Tips: qna

01 Oct 2002

We are redesigning an application that currently stores a large number of files. We have a choice to group the files into a number of subdirectories, or to place the files in a single directory. As an example, if you have 65536 files, you could have 256 subdirecto- ries with 256 files each, or a single subdirectory with 65536 files. I am interested in the differences between the legacy file system and the NSS file system.

My questions are:

What is the difference in create, find, delete times between the two methods? Does the legacy file system use a sequential search of the directory, or does it use a more efficient hash scheme?
What is the difference in memory usage between the two methods?
What is the overhead of having an extra subdirectory level?

Novell Storage Services (NSS) uses a B+ tree for storing names. As long as the names of the files are well distributed, it should be faster to find a file using I/O with NSS because NSS doesn't scan the whole directory for a name, and the directory is implicitly sorted by name.

The traditional file system requires a linear search which, for normal distribution of access, would result in a mean find time proportional to n /2, where n is the number of files in the directory. However, the reality is that the traditional file system loads the whole file system state into memory and hashes all the directory entries, which means that file lookup is not actually n /2. The traditional file system will probably search slightly faster, except for specific cases (which may actually be your normal cases).

Deleted entries occupy no space in NSS. In the traditional file system, the directory is represented as an array and deleted entries are flagged entries occupying space. This would imply that, for NSS, you can put all the files in the same directory (assuming the names are well distributed). For the traditional file system, it might be faster with subdirectories (assuming you could select the correct one quickly).

On the other hand, for the traditional file system, the search is more likely to happen entirely in memory, whereas in NSS it is more likely to happen on disk (depending on the working set size). Because of this, it's hard to make concrete comments on create and delete performance, but I think NSS will be every so slightly slower.

For either file system, the name-to-object parsing costs for names containing multiple elements would guarantee that create and delete time are faster for a single directory than for multiple directories. Indeed, the open time for a\b should be approximately twice that of \b. You would be unlikely to be able to measure that because I'm talking about the cost internally to the file system. This is the direct cost of having an extra subdirectory level.

Memory usage probably favors NSS since the traditional file system has the entire file system structure in memory at all times. NSS only has a working set, although NSS entries are likely to be larger. On balance, it is probable that the total memory loading of NSS will be lower for a given activity, which amounts to significantly fewer than the total files on the volume.

However, you should ensure that the ClosedFileCache size is larger than the working set for your application. For example, for a 65536 byte file working set owned by the application, you should use something like NSS /ClosedFileCacheSize=100000. If I were doing this, I would test all four methods (single/multiple subdirectories on NSS and traditional) and pick the one that worked best.

It only requires two applications to test all four cases (although you could use the NSS-specific APIs, which would then require four test APIs). Assuming I wasn't allowed this luxury, I would choose NSS. Although the open, find and delete operations are likely to be slower, these are both easily mitigated and the cache read performance of NSS is better.

* Originally published in Novell AppNotes

Disclaimer

The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.