In days of yore, disaster recovery (DR) meant offsite backup tapes. But when it came time to restore that data, some companies found it could take days, weeks or even an eternity to recover the systems - in the common event that tape backups proved incomplete or faulty.
Enter a wide range of snapshot, replication, mirroring and disk-based backup technologies to speed the time required for recovery as expressed in what is known as recovery time objective (RTO) i.e. how long it would take to have systems back up and running.
"I've noticed that SAN mirroring has become more common as a part of the DR plans we are hired to implement," says Chip Nickolett, principal of systems integration firm Comprehensive Consulting Solutions Inc. of Brookfield, WI. "Few firms can afford to be down one to three days while they to execute a DR plan."
He tells the story of a severe failure of a customer's system. The damage required a part that had to be shipped from the manufacturer. In this case, however, they got lucky - the data resided on the SAN and could be accessed without significant service interruption.
"We took a small development system, executed their DR plan (making changes on the fly to accommodate the smaller system), and had them up and running in approximately four hours," says Nickolett. "The DR plan allowed us to walk through everything that was needed, without forgetting anything, and restoring production systems in a relatively short period of time."
This example highlights two things: the importance of sound planning, and the velocity demanded by modern businesses.
"The best prepared organizations recognize that speed is essential in recovering from whatever disaster may come to pass," says Fred Moore, an analyst at Horison Information Strategies in Boulder, CO.
Storage Virtualization
A developing trend in the DR field is the adoption of storage virtualization technology. It plays an important role in reducing the time required for recovery.
South Florida-based All Medical Personnel, for instance, uses SANmelody software by DataCore Software of Fort Lauderdale, Florida, to speed up restoration of failed servers or disks, and increase the level of fault tolerance.
The driver? During the hurricane seasons of 2004 and 2005, the state got slammed by hurricanes. IT staff at All Medical realized that its existing architecture had to change. All of the company's data was managed at the corporate headquarters using a Citrix environment. Branch office applications were dependent upon HQ to gain access to much of their information. The company also heavily used VMware for server virtualization.
"If the corporate office went down, while the rest of the offices throughout the country were functional, they were crippled in some applications," says Karen Swanson, IT director at All Medical. "So we chose to move everything to a secure data center that would not be affected by natural disaster or a scenario such as a power outage."
All Medical decided to virtualize its storage environment in order to improve storage utilization across all systems and extend its DR capabilities. SANmelody became the backbone storage infrastructure. Weather-related disruptions in South Florida no longer have the same impact on the rest of the company. The ultimate business objective that has been met for Swanson and her team is continuity.
"DataCore supports a range of disaster recovery options and it opened up our ability to use iSCSI and/or Fibre Channel," says Swanson.
Another Florida-based company that has experienced disaster first hand is Berefeld, Spritzer, Shecter & Sheer. This accounting and business advisory firm, suffered power interruptions in 2005 that impacted its systems and data at its offices in Coral Gables, Sunrise, and Fort Lauderdale.
"We lost 14 days of operations and revenue," says Benjamin Thaw, Network Operations Manager at Berenfeld Spritzer. "The severity of the hurricane season prompted us to examine our preparations for continuity of business operations in case more severe disruptions should occur."
Although it is a relatively small enterprise - with 15 partners and a professional staff of over 150 - it recently built a sophisticated IT environment. As a result of the events of 2005, it upgraded its DR systems and processes over the past year, taking advantage of a move to a new headquarters to make several improvements to its infrastructure. This is now based upon the following: the data center has six HP blade servers at the new headquarters in Coral Gables, consolidated from about 15 older servers; the network backbone is based on Cisco Catalyst 4500 and 3750 switches and includes 15 wireless access points firm-wide; a SAN based on HP's Modular Smart Array 1000: a Wide Area Network (WAN) running over dual 3 Megabit T1 lines with redundant connections to Bell South; and network protection using a layered solution that includes a RADIUS server appliance to do network authentication, Web filtering, anti-virus through a spam filter, e-mail scanning and a Citrix remote access application.
The reason the infrastructure supports such strong mobility applications is that most staff use laptops and often work remotely, either from client facilities, from their own homes, or at alternative locations in the firm's offices. It implemented this system in conjunction with DR specialty teams from CDW Corp. of Vernon Hills, IL.
"Our enhancements to the comprehensive business continuity plan included a company-wide document management system and a second SAN, programmed to perform replication across the WAN," says Thaw. "For power backup, we chose an APC Symmetra PX 30-kilowatt, running at 30 kilowatts and scalable to 40 kilowatts."
He notes that future DR plans include an expansion of power backup to 40-kilowatts, with a generator backup and distributed server capacity. Once all power, communications, and infrastructure technologies are covered, the organization will test its systems to find and fix any gaps or issues, especially in the management processes to activate the DR measures on demand.
DR in a Box
A new trend in data center operations and recovery comes in the form of a pre-packaged complete data center in one container. The day could come when companies lease one of these containers before a weather event in order to run their systems in parallel, or introduce one in the aftermath at a temporary site.
Sun Microsystems Inc. of Santa Clara, CA, for example, has introduced Project Blackbox, a 20' by 8' shipping container that can be used as a temporary data center or can be implemented a complete DR solution sitting wherever the company wants it.
"Project Blackbox is a high density, low cost datacenter configured in an enhanced container for ease and speed of deployment," says Maurice Cloutier, senior product manager of Project Blackbox for Sun. "It is aimed at customers who are running out of space, need to minimize their investment, ease the pain of building new data centers, add a DR site quickly or lower power consumption."
Fully configured, this unit contains eight standard 19 inch racks - almost 300 rack units for servers, storage and networking gear. Each rack can take up to 25 kW of capacity. That's about 20,000 pounds when fully loaded.
Net prices will range from around $1.5M when filled with 250 Sun Fire X2100 servers (each using one dual core AMD Opteron Model 180 processor) and up to $5M for a Project Blackbox loaded with 14 Sun Fire E2900 servers (each with 12 UltraSPARC IV+ dual-core 1.8GHz CPUs).
Cabling is neatly arranged overhead. When a rack is moved into the aisle for maintenance, the cabling cradle folds over so wires are easy to connect and disconnect. The first rack is known as the control rack. The display model contained networking gear, a power distribution unit and a dehumidifier. The default networking connections are Ethernet (four ports), though Fibre Channel, iSCSI and InfiniBand are also supported.
Probably the most striking facet of Blackbox is the cooling design: air flows within the container forms a closed loop arrangement. Two sets of doors exist. The first one opens the box itself. The other contains the aisle between the two racks of servers (See Figure 1, Sun's Closed Loop Air Flow Design). Racks don't face each other. Instead, they are turned sideways and have a chiller between each one. Thus cold air enters the front of one server and hot air flows out the back, right into another server and so on down four racks. The hot air from the last server is then directed sideways and flows down the other side of the container through four more racks and four more chillers.
"Air flows in a circular path with fans and heat exchangers between each rack," says Cloutier.
Jonathan Eunice, an analyst at Illuminata Inc. of Nashua, NH, believes modular computing on this scale is the future of IT. He sees a wealth of possible uses. A container, for example, could be stored in a warehouse and quickly transported to the disaster site. Military organizations, too, might air lift them in to support remote operations.
"Blackbox truly introduces a new kind of module for datacenter construction," says Eunice.
Another vendor entering this data center in a box market is Rackable Systems, Inc. of Milpitas, CA. Concentro is a modular, containerized data center product with server and storage density, highly efficient cooling and easy serviceability: up to 9,600 processing cores in a 40' by 8' container.
"The need to reduce energy consumption and rethink space requirements is changing the look and feel of IT environments," said Giovanni Coglitore, founder and chief technology officer of Rackable Systems. "We applied Rackable Systems' experience in cabinet, server and power infrastructure to address the needs of space-constrained data centers. The result: an entirely new data center model that maximizes density, efficiency and performance while reducing costs of installation and management."
This mobile module, for example, can house up to 1,200 of Rackable's rack-mount DC powered servers or up to 3.5 petabytes of storage. It leverages Intel Xeon quad-core processors. Offsite systems administrators, for example, can use it for redundancy or emergency backup in the event of disasters.
Figure 1
More Than IT Gear
While it is vital to protect servers, storage and networking gear, other aspects of the infrastructure may also need attention. The Children's Medical Center in Dayton, Ohio, for instance, had to back up its cooling system.
"We originally had two AC units with only one running at a time and the other on standby," says Chuck Rust, senior network analyst at Children's Medical Center.
"When the data center became overcrowded, we added an additional AC unit, running two at a time, with one on standby."
But the pace of expansion caught up with the facility once again. It opened a second data center in an office space next door. Both data centers run UNIX, Windows and Novell servers, as well as an EMC SAN and a Cisco network. To keep up with soaring cooling demands, all three AC units had to be deployed simultaneously.
At that point, Children's Medical Center realized it had to change its operating basis. It purchased an InfraStruXure system from APC-MGE of West Kingston, RI. Chilled water takes cool air directly to the racks.
InfraStruXure is a complete system of integrated cooling and power components.
InfraStruXure InRow RP units, for example, place cooling in the rack next to the heat source. It can support power densities of up to 70 kW per rack when used with hot air containment systems i.e. the cold aisle is completely contained in a box that increases cooling efficiency. It can also speed up or slow down fans and water pressure as heat ramps up or down.
"We have gone from one or two heat related failures every few months to no failures in five months," says Rust. "Whereas the room used to feel hot beside certain servers, we don't have that problem anymore."
IT Forced to Look Outward
Hurricane researchers expect an active season in 2007. Whether or not this comes to pass, there will be other years and plenty of other emergencies lurking that could impact the unsuspecting data center: fire, flood, earthquake, power outages being just a few.
According to
Forrester Research Inc. of Cambridge, MA, the average cost per hour of downtime
is staggering. See Figure 2, Downtime is Expensive. Clearly, such costs mean
that IT can't look upon DR as a bubble that is totally contained within the
data center.
Figure 2,
Downtime is Expensive:
Application Segment Affected
Avg. Cost of Downtime per hour
Shipping
$28,000
Teleticket sales
$69,000
Airline reservations
$89,000
Home shopping
$113,000
Pay per view
$150,000
Credit card sales
$2.65 million
Financial markets
$6.45 million
Source: Forrester Research
For IT to be doing its job in the current environment, therefore, it has to take its DR responsibilities well beyond its traditional zone. That means coordination with facilities, HR, operations and top management before finalizing any DR plan.
"Planning for data recovery resulting from problems with hardware, software, people, intrusion, theft and natural disasters has never existed before at this level of intensity," says Moore. "We've added concerns ranging from widespread loss of electrical power to the growing intrusion threats from hackers to employee sabotage and terrorist attacks."
Drew Robb is a freelance writer specializing in IT.