System Administration
Home Systems Experience Biography Contact Us


Automated Documentation
Business Continuance
Performance Tuning
System Administration
Web Services

If you are a manager who is responsible for UNIX servers, you'll want to read and heed these 'Top 10 Rules' of Successful System Administration:

  1. Nobody gets the root password.   That's right - nobody; not the DBA, not the developers, not even your aunt Martha.   With the root password, anybody can do anything on your server.  Traceability is, at best, minimal.  Of course, when something goes wrong, nobody is responsible - except you!

  2. Know your System Administrator.  Do a background check on them.  Include a credit check and a criminal background check.   Whomever you hire as a system administrator has to be trustworthy.  Would you authorize a known criminal or someone with a bad credit rating to withdraw funds from your bank accounts at their discretion?  The reality of the situation is that few managers have any idea what their system administrators are doing and why.  The system administrator has a great deal of discretion, has access to valuable data and can assume the identity of anyone and do almost anything.

  3. Document your systems.  If your documentation is NOT in text format then your systems are NOT documented.  If the system administrator can't get to it in an emergency then it is no good.  The documentation that you put together last week (in terms of system information) is now out of date.  Insist that your system administrator maintains 'how-to' notes on the system in text format.  The 'how-to' notes should include the names and telephone numbers of contacts, how software was installed and why, etc.

  4. Insist on written plans.  Written plans should include step-by-step commands.  The preparation of the plan forces the system administrator to think about what is going to happen, the risks involved, etc.   Deviations from the plan should be expected.  Sometimes errors are discovered during a review.  By thinking ahead, the risk of something BAD happening is minimized.  Another benefit of the plan is that it provides documentation for future maintenance activity.

  5. Don't rely on tools.  If you believe the salesmen, the chimp from "Bedtime for Bonzo" could run your entire data center with just a click of a mouse.  There is no substitute for an experienced system administrator checking your systems.  Many of the tools are awkward, intrusive and expensive to maintain.  True, some tools are very helpful, but they are not a substitute for good systems administration.

  6. Implement a maintenance plan.  The plan needs to include scheduled reboots of the system, disk reorganization, collection, review and installation of system patches.  The maintenance plan needs the full support of upper management.  A small amount of scheduled downtime is a lot better than a system failure in the middle of a critical event such as closing accounting records for the end of the year.

  7. Develop a disaster recovery plan.   This means being able to rebuild your system at a remote site, not just merely being able to restore data from tapes.  A UNIX operating system is dynamic and is changing all the time.  See the section on "What you should know...Business Continuance".

  8. Back the system up.  Some companies think that some systems don't have to be backed up because they are a 'development system' or a 'test system', etc.  Sometimes it seems unimportant - until a critical file of source code is erased - then is becomes important.  Backups are cheap insurance and are easy to implement.  Backup EVERYTHING and put it on a retention cycle.  Do scheduled audits of your backups.  Review the logs, double-check the configuration, etc.  Its easy to forget a detail when you are busy doing 5 things and the same time!

  9. Assign a lead system administrator to EVERY server.   When everyone is responsible for a system then NO ONE is responsible for the system.  The lead administrator should check the system daily, should know what is running on the system, etc.  The lead system administrator coordinates all work done on that server.

  10. Calculate the cost of each server being down for 1 day.  Put a number on it!  This exercise puts into perspective how important good system administration is.  It gives a non-emotional or non-anecdotal basis for making decisions for extra tape drives, disks, training, etc.   For example, what is the cost associated with having a team of 10 developers (working on a high visibility project) sitting idle for a day or two?  Will the project leader or development manager sign off on a written document stating that a 2 or 3 days downtime is ok?  How much uptime do they want and are they willing to pay for it?  Have the business line or manager sign off on how long a system can be down.  If it is important for them to have it up and running, then they will back up their concern with money.