Handbook of Information Security Management:Risk Management and Business Continuity Planning

-->

Network Recovery Strategies

The information technology specialist’s prime objective with respect to systems contingency planning is system survivability. In other words, provisions must be in place, albeit in a limited capacity, that will support the company’s system needs for priority processing through the first few hours immediately following a disaster.

Fault Tolerance vs. Redundancy

To a degree, information technology specialists are striving for what is called fault tolerance of the company’s critical systems. Fault tolerance means that no single point of failure will stop the system. Fault tolerance is often built in as part of the operational component design of a system. Redundancy, or duplication of key components, is the basis of fault tolerance. When fault tolerance cannot be built in, a quick replacement or repair program should be devised. Moving to an alternate site (i.e., a hot site) is one quick replacement strategy.

Alternate Sites and System Sizing

Once the recovery planner fully understands the company’s priorities, the planner can size the amount of system capacity required to support those priorities in the first few hours, days, and weeks following a disaster. When planning for a recovery site or establishing a contract with a hot-site service provider, the information technology specialist must size the immediate recovery capacity. This is extremely important, because most hot-site service providers will not allow a company to modify its requirements once it has declared a disaster.

The good news with respect to distributed systems is that hot-site service providers offer options for recovery. These options often include offering the use of their recovery center, bringing self-contained vans to the company’s facility (equipped with the company’s own required server configuration), or shipping replacement equipment for anything that has been lost.

Adequate Backups with Secure Off-Site Storage

This process must be based on established company policies that identify vital information and detail how its integrity will be managed. The work flow of the company and the volatility of its information base dictates the frequency of backups. At a minimum, backup should occur daily for servers and weekly or monthly for key files of individual workstations.

Planners must decide when and how often to take backups off-site. Depending on a company’s budget, off-site could be the building next door, a bank safety deposit box, the network administrator’s house, the branch office across town, or a secure media vault at a storage facility maintained by an off-site media storage company. Once the company meets the objective of separating the backup copy of vital data from its source, it must address the accessibility of the off-site copy.

The security of the company’s information is of vital concern. The planner must know where the information is to be kept and about possible exposure risks during transit. Some off-site storage companies intentionally use unmarked, nondescript vehicles to transport a company’s backup tapes to and from storage. These companies know that this information is valuable and that its transport and storage place should not be advertised.

Adequate LAN Administration

Keeping track of everything the company owns — its hardware, software, and information bases — is fundamental to a company’s recovery effort. The best aid in this area is a solid audit application that is run periodically on all workstations. This procedure assists the information technology specialist in maintaining an accurate inventory across the enterprise and provides a tool for monitoring software acquisitions and hardware configuration modifications. The inventory is extremely beneficial for insurance loss purposes. It also provides the technology specialist with accurate records for license compliance and application revision maintenance.

Personnel

Systems personnel are too often overlooked in systems recovery planning. Are there adequate systems personnel to handle the complexities of response and recovery? What if a key individual is affected by the same catastrophic event that destroys the systems? This event could cause a single point of failure.

An option available to the planner is to propose an emergency outsourcing contract. A qualified systems engineer hired to assist on a key project that never seems to get completed (e.g., the network system documentation) may be a cost-effective security measure. Once that project is completed to satisfaction, the company can consider structuring a contractual arrangement that, for example, retains the engineer for one to three days a month to continue to work on documentation and other special projects, as well as cover for staff vacations and sick days, and guarantees that the engineer will be available on an as-needed basis should the company experience an emergency. The advantage of this concept is that the company maintains effective outsourced personnel who are well versed in the company’s systems if the company needs to rely on them during an emergency.

Table of Contents