All too frequently, business continuance and
disaster recovery are associated with backups.
Unfortunately, most people are under the unrealistic impression that if they have
a backup of their system, they can recover their system. All to
frequently, when people think that they have good backups they don't.
Consider the following issues:
- Keep documentation up to date. In a production environment, time is of the essence - configuring a UNIX system from
scratch may take days or may be impossible to reconfigure back to original specifications.
One way to ensure that you have updated documentation is to use an automated
documentation system to keep track of the configuration.
- Don't rely on 'Recovery' tools. Use recovery tools such as Hewlett-Packard's Ignite. Just don't rely
on them to work. There are limitations on what they will do and
certain criteria have to be met - IP address range, etc. While they
are a very worthwhile asset they are not a panacea for system recovery.
- The operating System must be INSTALLED. A UNIX-based system cannot just be copied back onto a system - it must be installed.
Backup programs must be reinstalled before any data can be recovered.
Do you have a copy of the media originally used to install the system?
Do you have all the patches? How about all the licenses?
- The configuration of a UNIX system changes frequently. It
changes whenever the system
administrator makes any change to the system. Notes are rarely kept in a neat, orderly
fashion - especially if the system administrator is busy. Many times, the UNIX operating
system itself is modified to accommodate data bases, distribution applications and the
like. This information must be carried over to the new system.
- UNIX servers are not the same. In the case of a real disaster, most companies will try to restore at a remote site.
However, the equipment won't be exactly the same. Systems are not the same -
even within the same brand or model line. Their architectures vary and
require different components of the operating system or patches to
function. If nothing else, the addressing on the drives won't be the same, so
merely copying old configuration data (assuming it is complete) won't do the trick.
- Know the relationship between disks and volumes. Most enterprise-level UNIX servers use a disk management system called logical volume manager
(LVM). In many cases the volume manager is native to the
operating system (as in the case of AIX and HP-UX). On other systems,
third party products are used. This management system spreads UNIX filesystems across multiple disks. To manage
the relationship of disks to the file systems the management system maintains a
relationship of the file systems to the physical disks. Disks will have different
"names" on different systems so merely copying system data won't provide the
necessary information to properly set up disks and rebuild filesystems.
If you are responsible for the disaster recovery plan for your UNIX system, make
sure that you always have current information on:
- Logical volumes, volume groups and filesystems, including type, size, mirrors, etc.
- Kernel parameters and system input-output configurations.
- Configuration files (there are a lot of them) such as the services file.
- Startup and shutdown information - including scripts, order, etc.
- cron jobs.
- Software licenses and licensing information.
- Know the other systems, printers, etc. that the server communicates with
beforehand and make the re-establishment of those communications part of
your plan. You will probably be surprised at the number and variety of
systems that the server is communicating.
- Testing procedures for every application running on your system.
- Procedures for storing tapes, recalling tapes and personnel, etc. Note: Make sure they
work. Just because it is written doesn't mean it works.
- The real cost of having your system down for 1 day. This
cost should include lost sales, cost of idle programmers and support personnel, etc. This
figure will help keep the cost of disaster recovery preparation and maintenance in
One final point: The devil
is in the details!
Trident Systems has recovered HP 9000
production systems several times, including test sites at Hewlett-Packard's Performance
Center in California and Sungard in Philadelphia, PA. Many companies claim they have some
mystical shortcut to disaster recovery. Usually these shortcuts come with numerous caveats
or unrealistic assumptions. These assumptions are shortcuts to disaster.
Don't wait until disaster strikes! Be prepared!