Centralized Multiserver Backup Over 100VG-AnyLAN Networks
Articles and Tips: article
Technical Marketing Engineer
Information Storage Europe
Hewlett Packard
01 Jun 1995
Due to tape drive performance and network bandwidth limitations, many network administrators have been forced to use the "distributed" backup strategy of having several tape drives on the network. Recent technological advances have resulted in tape drive performance and network bandwidth limitations being overcome. This makes it possible for network administrators to implement a centralized backup solution. This AppNote describes these new technologies and explores the benefits of a centralized backup strategy.
- Introduction
- The Enterprise Backup Challenge
- The Current Situation
- New Technologies
- Centralized Backup
- Performance Benchmark Results
- Conclusion
- Appendix: Test Equipment Configuration
- Appendix: Practical Details
Introduction
Until now, network administrators have been forced to adopt a distributed server backup strategy to effectively protect large amounts of data stored on the corporate network. Recent advances in tape drive, network and server technologies mean that a centralized back up strategy can be implemented. A centralized backup strategy increases backup speed, decreases administration, and leads to easier restore.
The Enterprise Backup Challenge
In today's business environment, information and data are often the basis of a corporations' competitive advantage. The amount of data stored on networks is increasing all of the time. Studies have shown that the average network has a storage capacity of 5 gigabytes. That number is predicted to double every few years. As a result of the growth in LANs, this data is usually distributed on servers around the workgroup and across the enterprise.
Recognizing the possibility of data loss by accidental erasure or catastrophic failure, the network administrator will want to protect the data by backing it up to tape. Although this may sound straightforward, with large amounts of data spread across many servers, finding an easy way to do this may be difficult.
Because of advances in network, tape drive and server technologies, simple and reliable protection of data across the enterprise is possible. This article reviews these technologies, describes a centralized enterprise backup strategy and gives some measured performance figures.
The Current Situation
The most cost effective way of protecting data against loss is to back it up onto tape. This involves copying data from a server's hard disk either to a local tape drive on the same server or to a remote tape drive located on another server.
The best backup solution would be to have one tape drive on the network which could back up the data from all servers. This would be simple, cost effective and easy to administer. However, many network administrators have been forced to move to a more "distributed" backup strategy by having several tape drives on the network. This is due to the limitations caused by the large amount of data stored on the network.
Backup Tape Drives
Most tape drives do not have the capacity to backup all of the data on the network. Although some large capacity, high performance tape drives exist, their use has been restricted because of their size, cost and proprietary recording formats.
Some administrators try to work around this limitation by "spanning" backups over multiple cartridges or by making incremental backups. Both of these approaches have their problems.
Since backups are typically carried out when network usage is low, someone must be available at night to change cartridges. Incremental backups only put onto tape those files which have changed since the last backup. Although the capacity required for each backup is less, a restore will be more complicated since several cartridges will be required.
Faced with the lack of capacity in single tape drives, network administrators have chosen to specify multiple tape drives which combined have enough capacity to backup all of the data on the network.
Network Infrastructure
Most of the common networking technologies have a limited capacity or "bandwidth" which is further reduced by traffic. Even though most tape drives can reach their full transfer rate when backing up locally on the server they are usually unable to achieve it when backing up over the network.
Because of the increasing amount of data requiring backup and the night time "window" of low activity remaining the same or even decreasing, the data transfer rate required to complete the backup now exceeds the transfer rate achievable over the bandwidth of the network.
Putting multiple tape drives on the network provides time savings in two ways:
Several servers are backed up at the same time "in parallel"
Since more data is being backed up locally, a higher transfer rate is achieved compared to remote backup.
The overall result is the completion of the backup within the night time "window".
The administrator who has implemented a distributed backup strategy is able to meet the basic backup requirements, but is faced with several disadvantages:
Servers will not only have the expense of an installed tape drive, but also of backup software, extra server memory and extra disk space.
While the backup is running, significant resources are utilized on each of the servers and if a malfunction occurs they may be halted. In both cases, users will be disrupted.
The administrator will have to use each of the installed backup software packages to manage the backup. This can be very time consuming.
Tasks such as managing media rotation and tape drive cleaning become more challenging as the number of tape drives increase.
The possibilities of lost cartridges, security breaches and errors, all leading to an unprotected network, are much higher when there are many tape drives in many locations.
New Technologies
Technological advances have resulted in tape drive performance and network bandwidth limitations being overcome.
DAT
DAT (Digital Audio Tape) is a well established tape technology which is popular for backup because it delivers good performance and reliability at a competitive price. The introduction of the DDS-2 DAT format with its increased capacity and transfer rate has led to DAT becoming the leading technology for server backup.
A single DDS-2 cartridge has a capacity of 4 gigabytes, which can double to 8 gigabytes with data compression. The fastest DAT drive has a transfer rate of 510 kilobytes per second which can double to 1 megabyte per second with data compression.
For applications requiring more capacity, multiple cartridge DAT autoloaders are available. These have the mechanical handling hardware to enable random access to between 4 and 12 cartridges, depending on the manufacturer. The fastest "in form factor" DDS-2 DAT autoloader has a transfer rate of 510 kilobytes per second and holds six cartridges, for a typical capacity of 48 gigabytes with data compression.
The advances in tape drive technology have been matched by advances in backup software functionality. Many leading software packages are now available to take full advantage of the performance and capacity of DAT tape drives and autoloaders.
Most of these software packages will allow concurrent backups to be performed with multiple tape drives. They also have automated tasks such as media management to simplify the administrator's work. Some have a Windows-based management interface with an enterprise view of all servers and files on the network.
100VG-AnyLAN
The limitation of insufficient network bandwidth has been addressed by the introduction of high speed networking technologies, such as 100VG-AnyLAN technology. Network bandwidth can be increased by installing 100VG hubs and LAN adapters. If the network was originally 10 Base-T, then the existing investment in cabling can be maintained.
Demand Priority Protocol 100VG technology is based on the "Demand Priority Protocol" access method. This simple, deterministic request method maximizes network efficiency by eliminating collisions. For a single segment, up to 95% bandwidth efficiency can be achieved. This means that up to 95 megabits per second total bandwidth is available, compared to the peak of 6 megabits per second that 10Base-T provides. Deterministic access and large network capacity mean that users have more bandwidth available and that they are almost totally unaffected by other network traffic.
In addition to these developments, server technology has continued to progress. Server prices have fallen dramatically while their performance, reliability and manageability have increased. Server families are scalable, which means that the right processor power and performance level can be economically selected for the chosen task. Most servers now come with integrated fast SCSI controllers and a choice in the number of EISA and PCI expansion slots.
All of these components can be combined to create a centralized backup solution.
Centralized Backup
The starting point of a centralized backup solution is a server dedicated to the backup task. The processing power and size of the server can be chosen to match this specific application. Having a server which performs no other tasks means that there is a high degree of task independence. The server can be allowed to work at full load and if there is a problem with the backup hardware or software, no other activities or users are affected.
The backup server can be networked to other servers with high speed 100VG Any-LAN networking components. Data from all servers will travel across the network to the backup server. The bandwidth of the network is large enough to accommodate all of the backup traffic as well as any user traffic.
The backup software is installed only on the backup server. Since no users other than the backup supervisor will be logging into the backup server, a minimal user license (for example, a five user license) can be purchased for the backup servers network operating system. Finally the tape drive or drives can be selected according to the amount of data on the network.
The centralized backup configuration can be optimized for capacity, performance, or both:
For capacity, a DDS-2 multiple cartridge autoloader can be specified.
For performance, two or more single DDS-2tape drives and a software package which supports concurrent backup can be specified.
For both capacity and performance, two or more DDS-2 autoloaders and the appropriate software can be specified.
This approach is flexible and scaleable. As more file servers are added to the network, the backup server can be brought off-line and its configuration changed, without affecting any users.
Centralized Restore
A centralized backup strategy brings the ability to perform centralized restores. This decreases the time taken to rebuild a file server in case of a major failure. When restoring a failed file server the first step is to reinstall the operating system. Usually, the next steps are to reinstall the backup software, recover the database, and then restore the data.
However, in the case of a centralized backup strategy, the two intermediate steps are unnecessary since the backup software and its database should still be running on the backup server. The administrator will save time by going straight to the data restore step. Since the network has a high bandwidth, there is no difficulty in restoring files remotely over the network.
If the dedicated backup server fails, the network may be temporarily unprotected, but no users are impacted. The administrator can reinstall and restore the backup server at any time.
Centralized Administration
Having only one backup software application installed on the backup server reduces the network administrator's routine workload. Backup and restore across the entire enterprise is controlled from a single user interface. In addition, by having all tape drives and cartridges physically located in one place, tasks such as media rotation, tape drive cleaning, and data restores become much simpler. Choosing a safe location enhances security.
Performance Benchmark Results
In order to compare centralized (remote) backup to distributed (local) backup and quantify the benefits, we ran several tests using typical data on a test network. The tests, described in detail later, showed the following results:
The 100VG-AnyLAN network allows remote backups to run at the maximum speed the tape drive can reach, even with large amounts of traffic present. There is no longer a reduction in speed when backing up over the network.
Remote backup of a server is faster because the backup software database disk writes do not interfere with the backup process disk reads. In a typical file structure, you can achieve a speed gain of 70%.
The centralized backup design based on a 486/66 server is able to host two to three concurrent backup sessions, depending on the data, at speeds between 110 to 125 Mbytes per minute. These are at worst equivalent or at best 30% to 70% greater than the sum of individual distributed backup speeds.
The server being backed up from a remote source experiences an acceptable loading. The 486/66 server performing remote backup experiences significant loading and should therefore be dedicated to the backup task.
Other Configurations
The centralized backup configuration can be varied, in order to meet any special needs which may exist. Some examples are shown below.
Automation of Media Rotation. By substituting a multiple cartridge autoloader for a single cartridge tape drive, the backup task can be further simplified by allowing the backup software and the autoloader to implement and automate a media rotation scheme.
High Speed Single Server Backup. With multiple tape drives installed, a single server can have its individual volumes backed up concurrently to individual tape drives This further increases the backup speed of the file server.
Tape mirroring. With multiple tape drives installed, a single server can be backed up concurrently or "mirrored" to two individual tapes, one of which can immediately be taken offsite. This would be appropriate in a high security application.
Hierarchical Storage Management. The dedicated backup server has the potential for being upgraded to a dedicated Hierarchical Storage Management Server.
Volume copying. With adequate disk space, a remote volume can be copied to the local disk with NCOPY or with some backup software utilities. Once the volume is on the local disk it can then be locally backed up to tape.
Conclusion
When using high speed networking components, there is no longer a penalty for backing up servers across the network. This means that network backup, restore and administration can be centralized to a dedicated server which hosts one or more high performance DDS-2 DAT tape drives or autoloaders. This approach provides a flexible and scaleable solution which increases the backup speed, decreases recovery time and simplifies administration.
Test Equipment Configuration
The following table shows the test equipment configurations used in the benchmark tests.
Backup Server |
Hewlett PackardNetserver LC 486/66, Single SCSI disk on internal Adaptec AIC 7770 controller, 24Mb RAM, NetWare 3.12. Novell Speed rating 1830. |
File servers |
Hewlett PackardNetserver LF 486/66, Single SCSI disk oninternal Adaptec AIC 7770 controller, 16MbRAM, NetWare 3.12. Novell speed rating 1830. |
Network |
Hewlett PackardJ2410A AdvanceStack 100VG Hub; J2577A 100VGEISA LAN adapter |
Tape drives |
Hewlett PackardC1561A SureStore Tape 12000e. DDS-2 DAT Autoloader.6 Cartridges, 48 Gbyte typical capacity,510 Kb/s native transfer rate.HewlettPackard C1541A SureStore Tape 6000e. DDS-2DAT Tape drive. 8 Gbyte typical capacity,510 Kb/s native transfer rate.All connectedto Netserver LC internal SCSI controller |
Backup Software |
Cheyenne ARCservefor NetWare Windows edition v5.01 with changeroption. Windows-based enterprise wide view. |
Client |
Hewlett PackardVectra XM 486/66, Windows, VLM's. |
Data on file server |
|
SET1 |
Mixed flatfiles with compressibility 1.33:1Datarequires moderate disk seeking for access. |
SET2 |
Typical mixed nested file set with compressibility 2:1Data requires much disk seeking for access. |
SET3 |
Small numberof large files with compressibility 2:1Datarequires little disk seeking for access. |
The DDS-2 DAT tape drives have a maximum native transfer rate of 510 KBytes per second which are increased by the data compression ratio achieved. For the tests, three sets of data were used; two with a compression of 2:1, the other with a more conservative compression of 1.3:1. These give maximum theoretical transfer rates of 61 MBytes per minute and 41 MBytes per minute respectively. The compression of data can vary between 1:1 to 8:1, the exact figure depending on the actual data.
Test 1: The effects of network bandwidth and traffic on the speed of backup.
Localserver backup on EISA bus
|
Remoteserver backup over 10 Base-T network
|
Remoteserver backup over 100VG-Any LAN network
|
|
Backupof server SCSI hard disk. Data SET1 Comp. Ratio 1.3:1 No traffic |
40 Mbytes /min* |
31 Mbytes/min |
40 Mbytes/min |
Backupof server SCSI hard disk. Data SET1 Comp ratio 1.3:1 2Mb/s traffic |
40 Mbytes/min* |
24 Mbytes/min |
40 Mbytes/min |
Backupof server SCSI hard disk. Data SET1 Comp ratio 1.3:1 64MB/s traffic |
40 Mbytes/min* |
Nominal bandwidthexceeded |
40 Mbyte/min |
* Writing to file database disabled for comparison purposes.
When backing up the local hard disk, data flows over the EISA bus to the tape drive. The backup speed achieved is close to the tape drives theoretical maximum. This shows that the tape drive is working to its full potential.
Backing up over the network using 10Base-T technology, reduces backup speed. This shows that the network is restricting the speed and that it has become a bottleneck. As traffic is added, the backup speed is reduced further.
When backing up over the network using 100VG technology, the backup speed is the same as for local backup. This shows that the network is no longer slowing the backup down and that the tape drive is again working to its full potential. The backup speed is maintained even with high volumes of traffic.
Conclusion from Test 1. The 100VG-AnyLAN network allows the remote backup to run at the maximum speed the tape drive can reach, even with large amounts of traffic present.
Test 2:The performance of distributed (local) backup compared to remote backup.
Localserver backup on EISA bus
|
Remoteserver backup over 10 Base-T network
|
Remoteserver backup over 100VG-AnyLAN network
|
|
Backupof server SCSI hard disk. Data SET2 Comp. ratio 2:1 No traffic |
27 Mbtyes/min |
31 Mbytes/min |
46Mbytes/min |
The backup software used in these tests, in common with virtually all other backup software applications, features a database which keeps a record of all files backed up. This is essential for simple file restores. During the backup process, the database will writerecords to disk.
The local backup is slower than both of the remote backups because the database disk writes and the backup disk reads occuron the same disk. This competition for disk controller activity results in a reduced backup speed.
Remote backups are faster because the write activity is restricted to the backup server's disk and the read activity on the file server's disks can continue uninterrupted.
The server transfer rate is faster over the 100VG network showing that the 10 Base-T network was previously a bottleneck.
In this test with a typical nested file structure, the remote backup over the 100VG is 70% faster than the local backup speed.
If a file server hosts a disk array, a higher local backup speed than seen here will be achieved.
Conclusion from Test 2. Remote backup of the server results in a faster backup speed because the database disk writes do not interfere with the backup process disk reads. With a typical file structure a backup speed gain of 70% is achieved.
Test 3: Maximum throughput of one and more remote backup operations.
Localserver backup on EISA bus
|
Remotebackup one server over 100VG-AnyLAN network
|
Concurrentremote backup two servers 100VG-AnyLAN network
|
Concurrentremote backup three servers 100VG-Any LAN network
|
|
Backupof server SCSI hard disk.Data SET2Comp. ratio 2:1No traffic |
27 Mbytes/min |
46 Mbytes/min |
90 Mbytes/min |
110 Mbytes/min |
Backupof server SCSI hard disk.Data SET3Comp. ratio 2:1 No traffic |
60 Mbytes/min |
60Mbytes/min |
120 Mbytes/min |
125 Mbytes/min |
This test was designed to find the throughput of the complete system, results should be read in conjunction with TEST 4. Each server was backed up to its own dedicated tape drive concurrently. The backup software, for all practical purposes, has an unlimited throughput.
For large files where database activity is insignificant (SET3), two backup sessions can be run concurrently with no degradation in performance compared to two single distributed operations. With greater than two operations, the 486/66 server reaches its limits.
For smaller files (SET2), two backup sessions can be run concurrently with virtually no relative degradation and an overall gain of nearly 70% compared to two distributed operations.
For smaller files (SET2), three backup sessions can be run with little relative degradation and with an overall backup speed gain of 35% compared to three distributed backup operations. At this point the 486/66 server reaches its limits
Conclusion from Test 3. The centralized backup design based on a 486/66 server is able to host two to three concurrent backup sessions, depending on the data, at speeds of between 110 to 125 Mbytes per minute. These are at worst equivalent or at best 30% to 70% greater than the sum of individual distributed backup speeds.
Figure 4: Measure of NetWare server loading during backup operations.
486/66file server being backed up.
|
486/66server hosting tape drive. Remote backup of one server.
|
486/66server hosting tape drive. Remote backup of two servers.
|
486/66server hosting tape drive. Remote backup of three servers.
|
|
Monitor Percentage Processor Utilization |
35% |
60% |
85% |
100% |
The Novell "Percentage Processor Utilization" values are a basic measure of server loading. All servers had a Novell speed rating of 1830.
The 486/66 file server being backed up recorded a utilization of 35%. This is a typical file server load and is not high enough to cause a disturbance to users.
The 486/66 server performing the backup recorded utilization values ranging from 60% to 100%. These are substantial loads. A server with these loadings should not host any other significant tasks.
During the backup of three servers the "Utilization histogram" showed the following distribution: LAN Card 25%; IPX Router 25%; FS Tape1 15%; FS Tape2 15%; FS Tape3 15%; SCSI Channel 5%; Total 100%.
In this case, the server processor utilization could be reduced and backup performance increased by a using a more powerful processor such as a 486/100 or Pentium and by using PCI interface cards instead of EISA cards.
Conclusion from Test 4. The file server being backed up from a remote source experiences an acceptable loading. The 486/66 server performing remote backup experiences significant loading and should therefore be dedicated to the backup task.
Practical Details
The following bullet list explains the practical details of our tests, and outlines some suggestions for network administrators moving to a centralized backup strategy.
A distributed backup strategy may be restricted in a NetWare 4 environment. Only one backup software package can be resident in any OU (Organizational Unit).
When the Netserver LC's internal AIC 7770 controller which hosts the hard disk also hosted the tape drive, backup speed was 10% faster than when the tape drive was hosted on a separate AHA1510 controller.
If you're sharing the backup with other applications on the server, consider a separate controller to give device independence.
When backing up during the day, consider the software options available for dealing with open files. Typically they should be re tried once immediately and then at the end of the backup.
Backup across bridges or routers will slow the transfer rates depending on the exact configuration. However, the higher band width of the network should mean that fewer or even none of these are present within one site.
Although high speed networking hardware helps improve network backup speed, the centralized backup concept can still be implemented on 10Base-T networks, but without a corresponding increase in backup times.
As an intermediate step, you could link only the servers together with a 100VG-Any LAN backbone.
* Originally published in Novell AppNotes
Disclaimer
The origin of this information may be internal or external to Novell. While Novell makes all reasonable efforts to verify this information, Novell does not make explicit or implied claims to its validity.