In the early 1980s, George C. was IT support on a team overseeing a large installation of workstations. At the time, this was a pretty novel concept. Several Unix site managers applied to help out but wanted "too much money," according to management. Instead, the IT manager rounded up a bunch of recent college graduates (who were much cheaper). Problem solved.

There were roughly 80 workstations that were being installed, each with two 70MB drives. One drive kept the operating system files (which the users couldn't modify), the other was the user drive for work files. Each system was backed up and updated nightly with a three step process:

  1. Back up all files that have changed on each client's user drive.
  2. Replace old files on each client's system drive.
  3. Delete files that are no longer needed from each client's system drive. For this step it'd just remove any files from the system drive on the client's machine that didn't exist on the server so everyone had a consistent system drive.

The tech writers on the staff were working overtime on the 5-year funding plan; a huge document that spanned several hundreds of pages. The file was growing in size quickly, and the machine the document was stored on was running out of space. The ops team didn't want to see any delays in getting the document submitted, so they bought a gigantic 150MB hard drive and installed it on the key machine and proceeded to load the file from backup.

Except the file wasn't in the backup. In fact, none of the user files had been backed up. Ever.

The user drive, in order to be seamlessly integrated into what appeared to be a uniform file system, was a "softlink" from the root partition. And I'll let you guess whether the backup system was configured to follow softlinks. (Hint: no.)

As a result, the backup was only running against the system partition, and all of the system partitions in the organization were the same. User files were untouched by the backup.

But wait, you're probably thinking, they still had the 70MB drive that the file was on, right? I'll let you guess again. (Hint: no.) No one knew where the drive was. Including, strangely, the guy who pulled the drive in the first place. They had just lost their only computer-readable copy of the document they'd been working on for months.

Thank god they'd printed out a copy a week before and had some early OCR technology at their disposal. They started scanning the document like crazy, and by "like crazy," I mean pretty slowly. The OCR technology wasn't terribly accurate or fast. And they only had a few days before they had to deliver the report. They needed another solution.

Desperate times called for desperate measures. Every employee in the building was ordered to put whatever they were working on on hold. Instead of their normal work, they'd each be given a stack of pages from the last printout of the budget plan to retype. The documentation team was tasked with collecting everyone's files, reassembling, styling, and formatting the document.

Meanwhile, the backup administrators took action to correct their embarrassing and costly mistake. They updated the configuration of the backup plan to follow softlinks.

The staff was stressed out and overextended; most people were working late and doing tasks they hated. One of the programmers was nodding off at his desk around 2:00 AM one night, and when he went to open a file he'd been working on, it was gone. In fact, all his files were gone. His user drive was empty. Remembering that the backups were kicked off at 2:00, he sprinted to the backup server to kill the process.

Remember the final step of the backup plan? It'd try to synchronize the user's system drive with the server's so everyone had a consistent image. Well, it extended this to user drives. Since the server's version of the user drive didn't have any files, it'd delete all of the files in the client's user drive. I guess the good news is that the synchronization was working.

The programmer killed the backup process in time to save half of the machines, but the other half had their user data wiped clean.

The irony of this issue is that George was on vacation and came back better off than any of the employees that'd been working overtime for the last week. The other employees couldn't do their regular work because they were tasked with retyping pages of the budget plan, only to have them lost by the backup system's destructive rampage. George was on a DOS system that he backed up on his own, so all he lost in the shuffle were a few emails.

In the end, the IT manager was able to keep his job and his team managed to finally get the backup system running correctly, or at least to the point that it wasn't deleting everything.

[Advertisement] BuildMaster allows you to create a self-service release management platform that allows different teams to manage their applications. Explore how!