Выбрать главу

This was a case of needing to figure out a better schedule. I realized this mantra:

If it has to be done every day, do it early in the day.

After I did my morning planning using The Cycle, I would list "change tapes" as an A priority every day.

As a result, there was one less thing weighing heavy on my mind all day, and I could be more focused and less stressed. I arrived home happier and less late. I started the day feeling like I had accomplished something right off the bat, and I had!

Routine #7: During Outages, Communicate to Management

Once upon a time there was a network outage. To make matters worse, there was miscommunication from the system administrators to management and the customers. Management felt they should have been told earlier about the problem. The system administrators felt they should be left alone to solve the problem. I'm sure this kind of thing has never happened to you...not.

After this event, we decided to develop a routine for the future. After all, this wouldn't be the last outage.

The routine was simple: after an hour, a particular manager (the boss of the chief system administrator) would be notified of an outage, even if it was late at night. The system administrators would then update this person every half hour until the problem was resolved. The manager would notify upper management and customers (if the outage didn't prevent communication to the customers) so the SAs could focus on solving the problem.

It was a simple routine and it worked well. Too bad we didn't have it in place before the first calamity.

If your company is particularly visible (hello Amazon, Google, and Yahoo!), such a routine should involve the Public Relations department. It's important to have this routine worked out before your first major outage, no matter how difficult it is to discuss. Some outages are so big that news reporters will want to know what's going on. You can imagine how messy things can get. This was more common long ago when anything with the words "Internet" or "computer security" was spiffy enough to draw in the news media. (Now the media has become jaded, and "Microsoft security hole affects millions of businesses" is unfortunately no longer considered news.) Nonetheless, if your business is high profile, it is important to have a media strategy worked out with the PR department ahead of time. Know whom to refer to if reporters start calling. If you don't have such a plan in place, the best answer you can give is, "No comment;" then hang up the phone before you are tempted to say anything else. It's very tempting to say something to a reporter, but many system administrators have learned that the best thing to do during an outage is to work on the technical issues and let PR deal with the media.

Routine #8: Use Automatic Checks While Performing Certain Tasks

I've developed the following habit so that I don't lock my keys in the car: when I'm about to close my door, I hold the door with my right hand and squeeze my left hand to make sure I feel my keys in it. Only if I'm holding my keys do I then close the door. I have a similar ritual when leaving my house.

Not that I've locked myself out a lot, but the few times it happened always seemed to be at the worst possible times and took several hours to remedy.

How does this relate to system administration? There are many automatic checks we can introduce into our work:

When I leave a secured room, I make sure I feel my access card-key in my pocket. (Related rule: I never put my card-key down on a table, floor, whatever, even just for a second. It always goes in my pocket and my pocket is where it goes.)

When I'm near equipment, I always pause to check for air flow. In particular, I make sure fans are not blocked by cables or other devices.

Any time a new hire joins the company, I always stop by to introduce myself, welcome her, fix any immediate problems she has, and explain how to get computer help in the future. If I can fix her immediate problems, it can help her get started sooner, and the sooner I can train her to create tickets (rather than call me directly), the better I can manage my time.

When I see a person I don't recognize, I always smile, stop, introduce myself, and ask for the person's name. I then ask to read it off his ID badge, telling him it will help me to remember it because "I'm a visual learner." New people think I'm being friendly. I'm really checking for trespassers.

Before I disconnect a network cable I set up a continuous "ping" (one per second), which should start failing when I disconnect the correct cable.

Every time I add a new rule to my firewall, I first set up a demonstration of what I want to block and show that it isn't blocked. Then I add the firewall rule. Then I repeat the demonstration and show that it now fails. (If I don't do the demo before I add the rule, I can't be sure the rule works for the reason I think it does.)

A More Useful Ping

It can be useful to have ping produce a beep for every successful ping. That way you can be elsewhere in the room disconnecting cables and not have to keep running back to your screen to see whether the pings are working.

Linux ping has an -a (audible) switch, which produces a beep.

Solaris and other Unix systems without the -a option can use the following trick. The output of "ping" happens to include a colon only on lines that report success. You simply pass the output through the tr command to translate each colon into a Ctrl-G (the "bell" character). $ ping -s 64.32.179.56 | tr : ^G

(Solaris requires the -s option to make it a continuous ping. Others do not.)

To get a Ctrl-G to appear on the command line, you may have to precede it with a Ctrl-V. That is, you type: $ ping -s 64.32.179.56 | tr : CTRL-V CTRL-G

Routine #9: Always Back Up a File Before You Edit

When I'm about to edit a configuration file, I always make a backup. I don't waste time thinking, "Gosh, is this file important enough?" If I have to ask, the answer is "Yes." I make backups the same way every time so there is no time wasted figuring out the best way. My system is to copy the file to a file with today's date on it. For example, named.conf is copied to named.conf-20060120 (January 20, 2006). I used to use the file's "last modified" date, but I found that it was much better to use today's date, which leaves a trail of when I made changes. In Unix, I can check the file into an RCS repository, which gives me infinite history of the file's changes (more on that in Chapter 13).

It's tempting to convince yourself, "I'm making a small change that I'll be able to manually undo" or "I'm an expert, I can't mess this up." However, hindsight has found that a backup is better. Especially three weeks from now when you can't figure out why that service has stopped functioning.