Choosing a LAN-based Imaging System for the Small Office Environment
Articles and Tips: article
Novell Systems Research
01 Aug 1996
Finding the right LAN-based imaging system may seem over-whelming for many Small Office/Home Office (SOHO) owners. To help in the decision-making process and extend the applicability of our imaging studies to small businesses, this AppNote condenses many of the considerations we have found helpful when implementing an imaging system, tailoring the discussion for the small business environment. It describes the types of imaging systems that are available, features to look for based on particular needs, capacity analysis, and other important issues that should be addressed.
Over the past few years, we have focused most of our imaging studies around the demands of large businesses. But ever-growing numbers of Small Office/Home Office (SOHO) users are now taking an interest in imaging. Imaging systems are no longer seen solely as a solution for large enterprises, but rather as an organizational tool for any size business. Part of the reason is that, unlike other business solutions, imaging systems are ideally suited for both the standalone environment and the neworked environment. With the large installed base of NetWare that exists in the SOHO segment of the industry, it seems reasonable for small businesses to consider implementing a networked imaging system.
However, finding the right LAN-based imaging system may seem overwhelming for many SOHO owners, many of whom lack networking expertise. To help in the decision-making process and extend the applicability of our imaging studies to small businesses, this AppNote condenses many of the considerations we have found helpful when implementing an imaging system, tailoring the discussion for the small business environment. It describes the types of imaging systems that are available, features to look for based on particular needs, capacity analysis, and other important issues that should be addressed.
The primary focus of this AppNote is to point out different imaging systems and standards to help SOHO owners choose the best imaging system for their particular needs. Because every business has unique needs, we can only provide general guidelines concerning the choice of an imaging system. Hopefully this information will allow you to ascertain which imaging system best suits your needs by examining advantages, disadvantages and specific issues with the implementation of a new system.
Imaging Meets Small Business
Technological growth, along with small business demands for efficient document management tools, have created a great synergy in the imaging industry. Imaging systems are no longer built with a direct focus on large industrial firms, but are being customized for small businesses. This recent trend has opened several doors for both hardware- and software-based implementations in a networked environment. Whereas in the past most imaging systems were relatively expensive (due to high costs for hardware and software to execute basic image retrievals), today several commercial off-the-shelf imaging solutions are available that perform more efficiently and more effectively than ever.
Although imaging technology has evolved to such heights, nearly 80 percent of all business documents in a SOHO environment remain in paper format. The reason is clear: the potential benefits to be gained from implementing an imaging system are directly proportional to the efficiency of the imaging system deployed. An ill-conceived imaging system can be disastrous in terms of unmet expectations and user frustration, whereas a well-designed, efficient solution will more than pay for itself in streamlining business processes.
Factors that influence a product's efficiency include:
The implementer's understanding of current and future technology
Inertia (how much has already been invested in the existing hardware and software)
Acceptability of new technologies
Costs for training
Payback or return on investment
Technical support and management
We'll discuss many of these factors as we go along in this AppNote.
The Myth of the "Paperless" Office
A common misconception in document imaging is the initial assertion that implementing such a system will result in a "paperless" office. In reality, no computerized document management system can or should eliminate all hard-copy documents used in the office. The simple truth is that paper is the most user-friendly communication medium known to man. However, there is an expense involved with maintaining paper documents when you consider physical storage space, manual filing labor, security, and the cost of related supplies.
While it is unreasonable to expect imaging to provide a paperless office, a properly-designed system can decrease the need for paper documents to such an extent that they are minimal for the functionality of a business. What you gain is a more cost-efficient, computerized, systematic approach to document management. Furthermore, an imaging system can provide for workflow, organization, and automated environments that the traditional file cabinet cannot offer.
As an example of how imaging systems can automate a small business, consider this common procedure involving incoming fax documents. Say Bob receives a fax from the ABC Company. His assistant, Tina, hand delivers the fax to Bob, who reads it and takes the appropriate action. Tina then puts the fax in a box of papers to be filed. This procedure is repeated for other employees as they receive faxes. Once there is a sizeable number of documents in the box, Tina organizes the documents and files them into individual file folders according to her own personalized indexing system. If Bob needs to refer to his fax again, he must consult with Tina to find out what type of indexing scheme she used to file his document. This approach may seem adequate for most small offices, but it can become cumbersome as the number of documents increases. If Tina ever leaves the company, it really becomes problematic.
With an imaging system deployed, a consistent filing policy can be created and maintained. Small businesses could benefit greatly in that employees do not have to guess where a document is, or whether the document has been updated recently. It can be a real boon for such office concerns as document security, physical storage space, supply costs, secretary utilization, contracted help, and the time required to retrieve a document. When implementing an imaging system, then, you should not look toward creating the "paperless" office. You should focus instead on enhancing business procedures.
Potential Benefits of Imaging
If implemented correctly, an imaging system can provide several benefits, including:
Conservation of physical storage space
Reduction in manual labor for document handling
Effective information sharing and reuse
Document version control
Overall office efficiency
Automated data workflow
Inexpensive hardware can be used for document management, which can be easily cost-justified over the lifetime of your documents. For documents that are frequently accessed, electronic documents may be more durable than paper documents because they do not suffer wear and tear. Unlike photocopies, electronic documents can be "copied" (restored) many times without any degradation in their appearance.
Features to Expect
Some features you should expect from any imaging system are:
The ability to scan, print, fax, copy (scan/fax in then printout)
Basic editing functions (skew, rotate, and so on)
From advanced imaging systems, you may see some options such as workflow routing, integration to e-mail systems, telephony, video/audio encapsulation, OCR/ICR systems, and compatibility with third-party software, among other features. The options you choose will greatly influence how your network will perform with these imaging stratagems.
Choosing an Imaging System
To get the most out of your imaging system, you must assess the needs of your business and choose the best imaging system accordingly. The only way to judge whether an imaging system is "good" or "bad" is to have a solid idea of what you expect to gain from the system. That is the first step: to define what your expectations are.
For some, imaging is seen simply as a way to manage scanned documents. Others may want the ability to manipulate the data contained in scanned documents using some type of optical character recognition (OCR) or intelligent character recognition (ICR). Some people believe that imaging should include such features as workflow management, voice/video encapsulation, and distribution of dynamic files through extensive networking systems such as the Internet.
What you ultimately gain from implementing an imaging system will depend largely on what your initial expectations are and how you perceive your business needs. To help you with this initial needs assessment, the following sections describe the types of imaging systems that are commercially available and indicate which types are suited to various business needs.
Types of Imaging Systems
To get a full understanding of what your business may need, first we must address the different types of imaging systems available. When imaging was first introduced, there were two basic types: graphical and textual imaging. Today, however, there are almost as many different kinds of imaging systems available as there are types of businesses.
Generally, today's imaging systems can be classified into two categories: simple document imaging and compound document imaging. Simple document imaging encompasses much of the traditional graphical and textual document types. Compound imaging incorporates graphical and textual imaging with a multitude of other object types (such as voice and video) and applications (such as telephony and workflow).
Simple Document Imaging
Simple document imaging focuses on two different types of documents: graphical and textual. Graphical documents are "snapshots" of their paper counterparts. They are static in nature, in that it is nearly impossible to alter the document unless you have software, time, and skill. For the most part, graphical documents are a lot larger than textual documents because of the high resolution that is required to view such a document.
Graphical documents are generally scanned images or faxes. Editing of the documents is very primitive without expensive third-party software and extensive skill. Graphical documents save the information as it was scanned in--any hand-written annotations or pictures will appear on the electronic copy. File sizes of such documents may vary directly with the scanning resolution used, the use of color or grayscale, image details, and file compression. A typical one-page graphical document can range anywhere from 50 KB to 100 MB. This is quite larger than textual documents, but it may be required in some organizations. These large files can contribute greatly to network traffic depending on the file size, how many images are being pulled at any one time, and so on. Disk space and file compression are an important requirements for storing graphical images.
Textual documents, on the other hand, are generally dynamic files that can be easily altered using a word processor, text editor, or spreadsheet program. Straight imaging uses its own package (and packaged components) for document management and manipulation. In other words, external products such as word processors and spreadsheets are seldom used. When they are, the images are imported directly by the application and then exported into the imaging database system. The imaging software has no interaction with external applications.
Textual documents are documents that are generally not scanned in; they are created electronically. These documents are easily edited by using a simple word processor, spreadsheet, or desktop publisher. When paper documents are scanned in, OCR or ICR engines may be used to extract the text. Text documents will only recognize scanned (or typed in) text, disregarding any hand-drawn images or graphics on the page. This is because the scanner uses OCR or ICR technology as appropriate, and converts the hard-copy text to electronic text. There is no way for the OCR engine to interpret the data from a graphic image.
Textual documents are the most conservative in size. On average, a one-page textual document takes 5 KB of disk space. With file compression, these small files are easily transported over large networks and therefore utilize very little of the server's resources.
As far as hard disk and server resource utilization is concerned, simple document imaging is the best alternative because the images tend to be significantly smaller in size.
In the past, imaging systems were implemented to manage large graphical files. As OCR and ICR technology evolved, imaging systems shifted from the basic graphical imagery to text-based imaging. In fact, although an "image" is often defined as an electronically scanned document (via fax or scanners), spreadsheets, word processing documents, and the like are included in the image category as well. (See the AppNote entitled "Issues and Implications or LAN-based Imaging" in the May 1992 issue). However, imaging has recently shifted from scanned technology to general document management.
Compound imaging is the most recent addition to imaging and is considered the wave of the future. It incorporates graphical and textual imaging with a multitude of different packages and objects (such as telephony and workflow). A compound image document may have a high resolution, graphical image with a voice inlay and real-time video to be transferred over an intranet system and e-mailed to several users. An OLE object may be imbedded into a document that will activate an external word processor system. Other things that may be seen in a compound image system are the integration of telephony systems with complex external application capability. This seems to be a popular option because specific hardware and software components do not have to be purchased. In effect, the software that is found on most desktops can be used with the imaging system with minimal work and planning.
Naturally, this is the most resource-intensive option of all. Not only do image files need to be examined, but so do the additional components such as video and audio encapsulation. In some cases, compound imaging has been known to bring a network to a slow crawl. This is mainly due to the huge file sizes, the extended number of image retrievals, and the number of people accessing the data.
If your organization will have a significant number of image retrievals per day, the best way to avoid high network traffic is to implement a simple imaging system. Compound imaging systems are the best reserved for those environments where few image retrievals will take place per day, and where the organization needs a "one size fits all" software package.
If resource utilization is an issue within your organization, but a compound imaging system is necessary, the best alternative is to use low-density images (images scanned at low resolutions or even textual documents if necessary). Avoid using extensive amounts of external components like video and audio. Video and audio options utilize a lot of disk space for storage, as well as bandwidth, LAN I/O, and CPU cycles within a server.
Critical Issues to Address
Among the critical issues that should be addressed are storage media, network specific issues, cost justification and return on investment.
Kinds of Images
Perhaps one of the first questions that should be addressed is, "What kinds of images are you going to manage, and how frequently will these images be retrieved?" The answers to these questions will help you choose which type of imaging (compound or simple) would best suit your business needs.
Obviously, if your business only has the need to pull textual documents, implementing a graphical imaging system would not provide for optimal performance. In a small business, rather than getting the top-of-the-line, "ultimate" configuration, one should customize the imaging system according to the business needs. Imaging software is far less expensive than buying hardware to support a bloated imaging system. For this reason, our first order will be to provide an example of a business community which includes both text and graphical issues.
Another question that should be asked before choosing the imaging system for deployment is, "Will documents be updated frequently, or will they remain in storage on a per-need basis?" If documents will be updated frequently, a compound imaging system will allow for the most flexibility. Compound documents are generally intended for frequent-revision environments, whereas simple document management is much like the old file-cabinet system.
Basic Imaging Paradigm
Imaging applications are usually designed with a particular paradigm in mind. For example, some are very user-oriented, focusing mainly on the end-user interface. Such user-friendly applications typically have graphical interfaces including thumbnails, index cards, and object linking. Other imaging paradigms revolve around such factors as the number of physical pages that compose a document (for instance, one image per document versus multiple images per document), and the number of document retrievals per hit. Other paradigms focus primarily on whether the imaging system is accessed primarily from a standalone station or on a network, including where the images are stored (on the server or on local hard drives). These are all valid considerations in choosing an imaging system.
Before evaluating storage needs, you should do a quick capacity analysis of your existing hardware versus what you will need for their new systems. Usually, existing hardware is sufficient for low utilization environments, but this does not mean that capacity analysis can be waived. While determining how much utilization will occur after the imaging system has been put in place may seem like a near impossible feat (because you have no way of predicting future utilizations), the following basic formula will give an approximation of what you can expect.
To decide how much capacity is necessary for the average (reasonable) low-resolution imaging database, consider how many pages will make up the average document (n). Then consider how large, in bytes, the average one-page document will be (s). Conservatively, (n*s) bytes will be transmitted every time a document has been requested. After you have know how many bytes will be transferred per hit, you must determine how many hits per day (h) your business will necessitate. Then, you can multiply the result of (n * s) by h to get the number of bytes transmitted:
For example, if your business will have 1000 hits per day, with 5 MB of data being transferred at every hit, you will be transferring 5000 MB of data per day on your network.
All that remains is to factor in the number of hours per workday (W), and you can calculate the total amount of data that will be traveling across the network per day:
(hns*(3600*W)) = bytes transmitted per workday
With this average figure, you can proceed to estimate the percentage of bandwidth that will be used during the average transfer. To determine current data transfer times over a network, the easiest "home test" is to copy a 10MB file from a PC to a server and time the transfer rate. Once the capacity analysis has been performed, along with the approximate transfer rate for a 10MB file, figuring out the network bandwidth is easy. If the bandwidth changes slightly from the current bandwidth utilization, there is no need to panic. However, if it jumps to 115 percent of its bandwidth, it may be necessary to consider alternative solutions.
After you have determined how large the average document will be, you can estimate how much hard disk you will need by multiplying (n*s) by the number of documents you estimate will be on your system (x). The formula to determine how many images can be stored on your current storage unit is:
(x*n*s = m) or x = m/(n*s) where m is the size of the hard drive.
There are many options available for storage of images. CD-R (CD-Recordable) disks have been a popular storage medium. Yet as noted in several publications, current CD-R disks have pitfalls including limited storage space, necessitation to copy data in a continuous stream, and so on. Some believe that CD-E (CD Writable/Erasable) disks will replace CD-R disks by including a read/writable capability, increased storage space per disk, and decreased cost per megabyte storage. Magneto-Optical drives and tape units have been introduced to the imaging industry, but are not usually recommended because of cost and potential wear.
In some of our in-house testing, we have used RAID (Redundant Arrays of Independent Disks). This alternative is too expensive for the average home system. Today, perhaps the most feasible, consistent, and evolving units for a SOHO are the basic hard drives that we have in our systems.
The decreasing costs for hardware (desktop computers, hard drives, and RAM) have made imaging systems more justifiable for SOHO owners. Until recently, the size of physical hard drive space was relatively small. Five years ago, 115 MB hard drives were considered advanced technology. Today with read/writable CDs and 4GB hard drives in personal computers, storage concerns are a thing of the past. While some believe that "bigger is better" in the imaging industry, the fact is most existing hardware in a SOHO office is sufficient for everyday imaging applications. 386 machines with 8MB of RAM and a 10Mbps Ethernet connection are sufficient for low-traffic imaging environments. When large graphic files are involved, one might want to go with a 486 machine or better, with 16 MB RAM and a relatively large hard drive.
Recent studies indicate that 486 machines with 16 MB RAM, running DOS/Windows 3.1, are the status quo in a SOHO environment. DOS/Windows has saturated the market, comprising about 80 percent of all desktop computer operating systems. Approximately 85 percent of all desktop machines are Intel or Intel clone based. Forty percent of all machines in the SOHO market are 486DX2s or better, and only 10 percent of the SOHO market has Pentium machines. Eleven percent of the market is still running on 386 technology. The remainder is 286 or less and processors that were not surveyed. Since most SOHO imaging products are flexible as to hardware, this provides for a promising future for imaging deployment.
Another consideration is what kind of searching capabilities are necessary for your imaging system. Most imaging systems allow for some type of search engine. Without a good search engine, an imaging system is almost useless. Trying to locate a single document amidst several hundred documents poses a challenge without any organizational methods deployed. This holds true especially if documents share the same topical matter.
Any search engine can prove to be beneficial if implemented correctly. Some imaging systems only allow for a full-page browse option. Of course, this is the most resource-intensive method because every individual image must be pulled across the network, regardless of whether this is the document you want to edit.
Some imaging systems go by the old file cabinet metaphor, where documents are organized by a particular key word, indexed and stored in separate directories or "files." This can be an advantage over full-page browsers, unless each file has a significant number of documents that must be retrieved and browsed through.
Some systems allow for keyword searches. The file will be archived with a few words that can be put into a database for later searches. This is probably the least resource intensive method, because instead of pulling individual documents, it opens a database file that points to a particular document.
Other image systems allow you to do a full text search rather than using predetermined keywords. This is much more resource intensive than a keyword search because documents must be passed through an OCR engine and searched individually, or you must have a relatively large database file. This also poses as a problem because of the scanning resolution. If the document was scanned at low resolutions, few words will appear (the ones in the large boldface). With a high resolution, all words can be scanned but the file sizes will be large. For database scans, you can do queries or boolean searches (and, or, not). Server traffic will depend on which kind of search engine is implemented.
For storage and image transmission, file compression is an important issue. All imaging software must compress the images; otherwise you could not store more than a dozen files. Images contain a lot of redundant data. File compression will remove the redundant data and thus pack the file down to a reasonable size.
Some file compressions pack "tighter" than others. Some compression techniques will condense the file to a great extent, but the quality of the document may suffer. The differences you will see in files sizes and compression rates depend on what type of compression format is used. Some common file compression formats include:
Run length encoding (RLE)
Joint Photographic Experts Group (JPEG) standard
Motion Picture Expert Group (MPEG)
CCITT Group III and IV compression (G3 and G4, respectively)
Packebits and RLE compressions are usually bitonal formats (for black and white images). Occasionally, they will be used for color, but the file compression scheme is usually not adequate. RLE compresses the file by converting all data to strings of ones and zeros. This type of data makes compressions much easier for the computer.
G3 and G4 compressions are usually associated with faxed documents, but can also be used for scanned documents. G3 compression is the industry standard for most fax machines. G4 compression can be found in some of the more sophisticated fax machines, and imaging systems. G3 and G4 are some of the best compressions. G4 compression can decrease a file size by nearly 50 percent over G3 compression.
JPEG is the latest hyped format. It compresses well, but loses quality as the compression scheme increases. To a certain extent, you can compress a file without losing a lot of quality in the image.
MPEG is a compression algorithm used for video files. However, in most SOHOs, full motion video and compound imaging systems will not be used.
Note: Do not confuse compression utilities with image fileformats such as TIFF, DCX, GIF, BMP, PCX, and so on.These format types merely indicate what image viewingtechnology can be used, rather than compression formats.
As noted in our earlier articles (those listed on the title page of this AppNote), image retrieval speeds are dependent on a multitude of factors. Potential imaging bottlenecks can occur depending on what type of hardware has been implemented and imaging system limitations. We have found that the traditional bottlenecks that plagued the imaging environment encompassed limitations such as RAM, CPU, storage technology, storage speeds, and LAN topology.
As we got more in depth in our imaging studies, we found additional issues that come into play that focus less on hardware limitations and more on software and technological limitations. We have also found that the operating system deployed has a role in the image retrievals on a per-user basis. In other words, in environments such as Windows 3.1, as the number of workstations increases, resource utilizations increases proportionally. In the March Appnote, we studied the effects that an interpreted imaging system versus a compiled code imaging system has on network performance. Here we noted that not only does compiled code utilized fewer server resources than interpreted code, but it also provides for faster image retrieval rates.
Furthermore, as noted in the May 1996 NetNote, the type of driver you use in your imaging environment is a factor in retrieval speeds as well. When mixing different clients and drivers, we see a slight increase in CPU utilization, retrieval times, and initial lag times.
In recent tests, we have been examining the effects of a 32-bit operating system versus the traditional 16-bit Windows 3.1 environment. These results will be introduced in later Appnotes. Recently we have found the need to examine some of the issues with compound imaging systems. With the extended interest in the Internet and the Intranet, we see a significant paradigm shift in imaging systems.
Factors that will influence image retrieval rates and resource utilization include the search mechanism used, the size of the images, the type of image (compound or simple), initial network traffic, the number of hits on a particular image, file compression techniques, and other software specific issues such as document management standards (ODMA, OLE, DEN, OGM, and so on).
Storage Access Types
There are different storage access types available depending on which type of imaging software you want to implement. "Online" storage usually implies that the storage media is on a network, with fast access hard drives or optical drives readily accessible to users. "Near-line" usually indicates that several storage management systems are combined to access one hierarchial storage management system (HSM) to locate a file and redirect the information to online storage units. This usually allows for slower retrieval times, but is good for data that is not accessed frequently.
Commercial or Custom Software
Commercial off-the-shelf (COTS) software seems to be the recent trend for SOHO implementors. However, a SOHO owner will typically purchase imaging software through VARs or through mail order rather than going to a local software store. Customized imaging software (built by contract programmers) seems to be less popular than COTS because of the extended costs involved when engaging outside help. Lack of documentation, programming time, and lack of support are other reasons why SOHOs tend to shy away from customized imaging software solutions.
Imaging technologies for SOHO businesses are a manageable option and can be customized to suit any business environment when issues have been analyzed in their entirety. The point to document imaging is not to create a "paperless" environment, but to increase productivity and accuracy in an organization by using innovative technology. When implementing an imaging system, your return on investment will vary proportionally depending on what business factors have been adhered to, versus user-defined issues such as physical resources. Although the technical aspects have been covered in this and other AppNotes in the imaging series, other issues must be assessed. These include cost justification of contracted employees to install the software, maintenance, staff training, and possible hardware upgrades.
Storage media is commonly cited as the most expensive resource for document imaging. While a lot of discussions about storage problems include cost, security, accessibility, and space, the fact that computerized documents occupy less physical space than paper documents proves that savings can be gained by implementing such a system. Other benefits are improved efficiency and decreased personnel costs (commonly the most expensive resource in any business).
We have explored many of the potential benefits and issues associated with implementing an imaging system in a SOHO. While there are some pitfalls--such as an upsurge in bandwidth utilization, along with other server resources including hard disk space and CPU utilization, if the business needs and resources are analyzed in their entirety, potential network problems may be avoided. Most SOHO imaging products will not cause extensive taxation of existing hardware unless the package is "overkill" for the necessary SOHO applications. The biggest pitfall in implementing any imaging system is compatibility of files. If a document was created with Package A, then the recipient must either have Package A or a conversion utility so that he or she may view the document.
Hopefully, this AppNote has provided detail on some of the different types of SOHO imaging products available. In the next AppNotes in the imaging series, we hope to further explore SOHO environments to bring a better understanding of available technologies and integration schemes.
* Originally published in Novell AppNotes
The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.