DATA RECOVERY
A SEMINAR REPORT
Submitted by
KHATEEB AHMAD
in partial fulfillment for the award of the degree
of
Bachelor of Computer Application
DEPARTMENT OF COMPUTER APPLICATION
INTEGRAL UNIVERSITY LUCKNOW
MAY 2017
1
INTEGRAL UNIVERSITY
LUCKNOW
CANDIDATE’S DECLARATION
I hereby certify that the work which is being presented in the Seminar Report of “Data Recovery” in the
partial fulfilment of the requirements for the award of the Degree of Bachelor of Computer Application
and submitted in the Department of Computer Application, Integral University, Lucknow is an authentic
record of my own work carried out during a period from January 2017 to May2017, under the guidance
of Dr. Tasneem Ahmed, Assistant Professor and Mrs. Nashra Javed, Assistant Professor, Department of
Computer Application, Integral University, Lucknow.
The matter presented in the lab report has not been submitted by me for the award of any other
degree of this or any other University.
(Khateeb Ahmad)
This is to certify that the above statement made by the candidate is correct to the best of our
knowledge.
Dated:
(Dr. Tasneem Ahmed) (Mrs. Nashra Javed)
Assistant Professor Assistant Professor
This is to certify that the above statement made by the candidate is correct to the best of my
knowledge
2
Head of the Department
ACKNOWLEDGEMENT
It gives me great pleasure to present before you Seminar Lab report strictly under the guidance of Dr.
Tasneem Ahmed, Assistant Professor and Mrs. Nashra Javed Assistant Professor, Department of
Computer Application.
They have made sincere efforts to make the report more meaningful complete compact and
comprehensive. It’s a great pleasure to let you know that I have put my feelings into practice.
At last I give my special thanks to our batch mates for all the valuable suggestions without which this lab
report could not be completed.
-Khateeb Ahmad
3
TABLE OF CONTENTS
TITLE PAGE NO.
1. INRODUCTION 4
2. DATA BACKUP MEDIA AND HARDWARE 6
HARD DRIVE 7
MAGNETIC TAPES 9
OTHER MEDIA 11
3. STACKERS,AUTOLOADERS,AND TAPE LIBRARIES 15
4. TOOLS FOR BACKUPS AND BACKUP SERVICE 16
FREE TOOLS 17
COMMERCIAL TOOLS 19
BACKUP SERVICES 23
5. SCHEDULING BACKUPS AND GOOD BACKUP PRACTICES 24
CREATING A BACKUP SCHEDULE 24
GOOD BACKUP PRACTICES 26
6. DATA RECOVERY 28
7. DISASTER RECOVERY PLAN 30
8. THE ESSENCE OF DATA RECOVERY 32
9. THE SCOPE OF DATA RECOVERY 33
SYSTEM PROBLEM 33
BAD TRACK OF HARD DISK 34
PARTITION PROBLEM 34
FILES LOSS 34
PASSWORD LOSS 35
FILES REPAIR 35
10. THE PRINCIPLE OF DATA RECOVERY 36
11. DATA PROTECTING TECHNOLOGIES 37
SMART TECHNOLOGY 37
SPS 37
DFT 37
FLOPPY DISK AND ARRAY TECHNOLOGY 38
SAN 40
12. TECHNICAL SPECIFICATION 42
4
13. CONCLUSION 46
14. REFERENCE 48
Introduction
The collapse of the World Trade Center on September 11, 2001 reinforces the importance of backing up
critical data, protecting the backups, and planning for disastrous data losses. It is estimated that the cost to
replace technology and systems lost in the World Trade Center (WTC) disaster could be $3.2 billion .
However, some companies that occupied the WTC, such as Morgan Stanley, were able to quickly
recover. The financial giant was able to start running again because in addition to having the usual data
backups that most companies keep on site, it also maintained real-time copies of its data in a second
location miles away. All transactions occurring on the company WTC servers and mainframes were
continuously transferred through high-speed telecommunications lines to computers in Teaneck, New
Jersey.
An event as unimaginable as the WTC collapse is not the only way to have a data loss disaster. There are
countless ways to loose critical data. Human error is among the leading causes of data loss. For example,
mistakes as simple as typing “rm *” at the UNIX command prompt can have disastrous results. In
5
addition, software failure, hardware failure, computer viruses, malicious or disgruntled employees, and
natural disasters such as fire and flood can cause a catastrophic system failure. The Disaster Recovery
Institute International estimates that 90 percent of companies that experience a significant data loss are
out of business within three years .
Making backups of all data seems like the obvious solution. However, many small companies have
inadequate backup practices or technology. The problem could be insufficient storage capacity, an
inability to use backup solutions, a lack of backup testing, no offsite data storage, or inconsistent backup
procedures. Unfortunately, backing up data is simply given a very low priority at a number of small
firms.
According to a survey conducted by storage media vendor Imation, 30% of small businesses lack formal
data backup and storage procedures or do not implement those practices consistently. In fact, thirty-nine
percent of the small firms surveyed admitted that they review their storage procedures only after a
problem occurs. In addition, more than one third of the respondents said they do a fair or poor job of
storing backup data offsite, and over half rate their disaster recovery plan as fair or poor .
It is very difficult and in fact sometimes impossible to function normally during a crisis. It if for this
reason that it is important to think about data backups before a disaster strikes. This paper provides some
6
guidance for developing a data backup plan by summarizing data backup media and hardware
technologies, data backup procedures and services, and data recovery services. It also provides an outline
for disaster recovery planning.
Data Backup Media and Hardware
One of the first decisions to make when preparing a data backup plan is to decide what physical medium
will be used to store the data backups. The speed, capacity, cost, and life expectancy of the medium are
all considerations that should be taken into account when making this decision.
In terms of backup media life expectancy, a number of factors should be considered. For long term
storage situations, the media may become obsolete due to changing technologies before it physically
degrades. The information on this media would therefore become unusable. Similarly, the life expectancy
of the media may be much longer than the amount of time it takes for the information on the media to
degrade. Due to these considerations, care must be given when choosing media based on it’s life
expectancy.
Hard Drives
7
Data storage onto hard disk media is becoming more and more prevalent in corporate data centers,
according to a survey of more than 1,000 information technology managers conducted by Peripheral
Concepts, Inc. .
Hard drives (magnetic disks) have a very high data capacity, currently holding over 100GB of data (5). A
typical hard drive consists of platters that are coated with magnetic film. These platters spin while data is
accessed by small heads situated on drive arms. This geometry enables data to be accessed randomly, and
thus very quickly. In addition to a high storage capacity and speedy access, magnetic disks are estimated
to have an expected life span of 5-10 years . Although hard disks used to be the most expensive backup
media, prices have dropped exponentially in the last few years. Removable hard disks are becoming even
more affordable, and have capacities of over 2GB. The Orb Drive, by Castlewood Corporation, is an
example of such a product .
Hard drives can be used for data backups by mirroring. Disk mirroring is a technique in which data is
written to two duplicate disks simultaneously. If one of the disks fails, the system can quickly switch to
the other disk without any loss of data or service. Mirroring is commonly used for systems such as
Internet databases, where it is critical that data be accessible at all times. However, there is a problem
with this technique: if both disks are a part of the same machine and the disk controller (or the whole
8
machine) fails, neither disk would be accessible. One possible solution to this problem is to implement a
mirroring scheme that involves multiple machines. The backup machine duplicates all of the processes of
the primary machine. This is effective because if the primary machine fails, the backup machine can
immediately take its place without any down time. Although this is a good solution for the problem of a
failed machine, the possibility still exists for loss of both machines, for example due to fire. To prevent
this situation, some companies have network mirroring. These companies mirror their main disks with
disks in remote locations via a network connection. However, this type of mirroring is expensive. Each
machine must be mirrored by an identical machine whose only purpose is to be there in the event of a
failure.
Of course, mirroring does not provide total protection against data loss. If a computer virus destroys files
or files are accidentally deleted, the mirrored files will also be destroyed or deleted. Having a previously
stored copy of the data is important, therefore traditional data backup media will still be required. The
Peripheral Concepts survey also shows that a large majority of data is still backed up and archived the
traditional way: on tape.
Magnetic tapes
9
Magnetic tape is the most realistic and traditional medium for creating backups. The tape is actually a
Mylar film strip on which information is magnetically stored. Because magnetic tapes
are a sequential storage device (tape drives cannot randomly access data like other storage devices, such
as disk drives), they are much slower. However, high storage capacity and low cost make magnetic tapes
the storage medium of choice for archiving large amounts of data. Helical scan devices are also magnetic
tapes, but the data heads spin at an angle to the strip of tape, thus creating denser data storage and higher
capacity.
Life expectancy and the number of times a tape can be reused depends not only on the quality of the tape
itself, but also the environment in which it is stored and the quality and maintenance of the tape drive
heads. An estimate of magnetic tape life expectancy is 1 year (6).
QIC (quarter inch cartridge, pronounced “quick”) is a technology standard for magnetic tapes developed
by a consortium of manufacturers called the Quarter-Inch Cartridge Drive Standards, Inc. (8). Travan
tapes, developed by 3M Corporation, are a high density form of QIC standard tapes. Travan tapes were
widely used by companies several years ago, but are now often used for personal computer backup Also
called floppy tape because they can use a PC’s floppy disk controller instead of requiring their own
10
special controller, the drives are inexpensive and reliable. The current maximum storage capacity of
Travan tapes is up to 10GB, but they are relatively slow.
DAT (digital audio tape) come in two standard sizes, 8mm and 4mm. 4mm DAT's are helical scan
devices and therefore can support storage capacities up to 20GB. 8mm DAT's have storage capacities of
only about 7GB. The 4mm tapes have a great advantage over other tape media; they are physically the
smallest and therefore take up less storage room. A disadvantage of these tapes is that they are very
sensitive to heat damage, thus complicating the selection of a storage location. DAT tapes come in two
formats. One format is for recording video or audio, the other is for binary data. The video/audio tapes
work for making data backups, but they are less reliable than the binary format in terms of retaining data.
The 4mm DAT is currently the most widely used tape type, but it is being replaced by digital linear tapes
(DLT).
DLT tapes have a storage capacity of up to 40GB. The drives are quite fast and are the newest standard
backup media technology. Quantum Corporation ([Link]), the manufacturer of the Super
DLTtape II, claims a shelf life of 30 years on their product. Besides this unbeatable life expectancy, the
DLT tape has another advantage. Like 4mm DAT’s, DLT’s are small. The DLT dimensions are
11
approximately 1" X 4" X 4" and they weigh only about 8 ounces (9). DLT’s are currently a very popular
medium, even though they are still relatively expensive.
Other Media
Optical disks, such as recordable CD-RW’s have a much longer lifespan than do tapes (except for
DLT’s). The estimated lifespan of optical media is greater than 30 years (6). However, optical disks have
a smaller data capacity than tapes, and they are more expensive per GB. Floppy disks are the least
expensive media, but because they have such a small capacity, a huge number of them are needed to back
up even moderate amounts of data, thus making the price per GB very high. Table 1 summarizes the
media types discussed above as well as several others.
12
Table 1. Capacitya Speeda Drive Media Cost/GB Reuse? Random
Compariso
n of
backup
media (10)
Medium
Floppy disk 2.8MB < 100 KB/s $15 25¢ $91.43 Yes Yes
SuperDisk 120MB 1.1 MB/sb $200 $8 $68.27 Yes Yes
Zip 250 250MB 900 KB/s $200 $15 $61.44 Yes Yes
CS-R 650MB 2.4 MB/s $200 75¢ $1.18 No Yes
DC-RW 650MB 2.4 MB/s $200 $2 $3.15 Yes Yes
Jaz 2GB 7.4 MB/s $350 $100 $50.00 Yes Yes
Orb 2.2GB 12.2 MB/sb $200 $40 $18.18 Yes Yes
Exabyte 7GB 1 MB/s $1,200 $8 $1.14 Yes No
(8mm)
Travan 10GB 1 MB/s $200 $34 $3.40 Yes No
DDS-4 20GB 2.5 MB/s $1,000 $30 $1.50 Yes No
(4mm)
ADR 25GB 2 MB/s $700 $40 $1.60 Yes No
DLT (1/2 40GB 6 MB/s $4,000 $60 $1.50 Yes No
in.)
AIT-2 50GB 6 MB/s $3,500 $95 $1.90 Yes No
(8mm)
Mammoth-2 60GB 12 MB/s $3,500 $80 $1.33 Yes No
a. Uncompressed capacity and speed
b. Maximum burst transfer rate; the manufacturer does not disclose the true average throughput.
Table 1 illustrates the problem with using inexpensive yet low capacity media such as floppy disks.
Compare the cost per GB of floppy disks to DLT. Even though floppies and floppy drives are much
13
less expensive than DLT cartridges and drives, they have such a small capacity that many more are
needed to store data, thus resulting in a high cost per GB. The table also illustrates the differences in
access speeds of the various storage media. For example, compare the speed of the Orb disk with the
DLT speed. Orb is much faster because data is accessed randomly on disks, while tapes such as DLT
require sequential access.
Stackers, Autoloaders, and Tape Libraries
As technology progresses and more work is becoming automated on a global scale, more and more
data is being generated. Unfortunately, a decrease in human resources is happening concurrently.
The result is that fewer people are available to handle data backups. What’s more, due to this
increase in data, stand-alone tape drives are often not sufficient in capacity to even backup mid-sized
networks. A good solution to these problems is to automate backups. In addition to reducing the
need for the manual handling of backup media, automated backups involve multi-volume media
devices, thus greatly increasing storage capacity. Automation also makes backups reliable and
consistent.
14
Backup automation combines robotics with backup media and software to produce a device that can
load, unload, and swap media without operator intervention. Stackers are tape changers that allow
the operator to load a hopper with tapes. Tapes are inserted and removed in sequential order by the
stacker’s robotic mechanism. For a stacker to backup a filesystem, it would begin with the first tape
and continue automatically inserting and removing tapes until the backup was complete or until it
ran out of available cartridges. Autoloaders have the added functionality of being able to provide
any of their tapes upon request. Libraries are similar to autoloaders, but have the added ability to
support larger scale backups, user initiated file recovery and simultaneous support of multiple users
and hosts. Libraries are larger and more complex than stackers or autoloaders. As a result they are
more expensive and tend to be used by larger scale companies.
15
Tools for Backups and Backup Services
Dump is the native UNIX utility that archives files to tapes. It is the most common way to create
backups. Dump builds a list of files that have been modified since last dump, archives these files into
a single file, and stores this file on an external device. Filesystems must be dumped individually.
Dump works only on local machine, not over a network, so the dump command must be issued on
each machine that is to be backed up.
Note that the Solaris operating system’s version of dump is not quite the same as other UNIX
systems. In Solaris, the command ufsdump is equivalent to dump. Dump takes as an argument an
16
integer value that represents a dump level. This is related to scheduling backups and is described in
the next section, Scheduling Backup and Good Backup Practices.
Free Tools
AMANDA (The Advanced Maryland Automatic Network Disk Archiver) is a public domain utility
developed by the University of Maryland. It was designed to backup many computers in a network
onto a single server’s high capacity tape drive. It also works with multiple stackers, allowing for a
great increase in backup data capacity. AMANDA uses the native UNIX dump utility and does it's
own dump level scheduling, given general information by the user about how much redundancy is
desired in the data backups. AMANDA is one of the most popular free backup systems, and has a
large user community. Based on the membership of AMANDA-related mailing lists, there are
probably well over 1,500 sites using it (11). The UNIX System Administration Handbook (10)
provides an abbreviated yet comprehensive walk-through of the AMANDA utility.
BURT is a backup and recovery tool designed to perform backups to, and recoveries from, tapes.
BURT is based on Tcl/Tk 8.0 scripts, and because of this it is very portable. It can backup multiple
system platforms (12).
17
The native UNIX dump utility can be automated with a shell script called [Link].
[Link] enables the user to ensure that dump performed properly by checking return codes. It
also provides an intelligent way to choose which filesystems to backup and creates a table of
contents for each backup.
Star is an implementation of the UNIX tar utility. It is the fastest known implementation of tar, with
speeds exceeding 14MB/s (13). This is more than twice as fast as a simple dump. Another nice
feature of Star is that it does not clobber files. More recent copies of files already on disk will not be
overwritten by files from the backup medium during a restore. Star is available via anonymous ftp at
[Link]
Afbackup is a utility that was written and is maintained by Albert Flugel. It is a client/server backup
system that allows many workstations to backup to a central server, either simultaneously or
sequentially. The advantage of afbackup over a simple dump is that backups can be started remotely
from the server or by cron scheduling on each of the clients.
Bacula is a network based client/server backup program. It is a set of computer programs that
manage backup, recovery, and verification of computer data across a network of different types of
18
computer systems. Bacula is efficient and relatively easy to use, and offers many advanced storage
management features that make it easy to find and recover lost or damaged files.
Commercial Tools
Makers of backup and recovery software booked $2.7 billion in revenues in 2001, and that figure is
expected to grow to $4.7 billion in 2005, according to research firm IDC (14). This reflects the
popularity of commercially available software for data backups. Legato and Veritas are currently
two of the most popular backup software companies. NetWorker from Legato is a commercial
package which allows storage devices to be placed throughout the network as NetWorker nodes.
These nodes can then be managed during backups as if they are locally attached devices. The
following is an excerpt from the Legato NetWorker Administrator’s Guide, published on the website
of Sun Microsystems (15):
With NetWorker, you can:
• Perform automated “lights out” backups during non peak hours
• Administer, configure, monitor, and control NetWorker functions from any system on a
network
• Centralize and automate data management tasks
19
• Increase backup performance by simultaneously sending more than one savestream to
the same device
• Optimize performance using parallel savestreams to a single device, or to multiple
devices or storage nodes
NetWorker client/server technology uses the network protocol Remote
Procedure Call (RPC) to back up data. The NetWorker server software consists
of several server-side services and programs that oversee backup and recover
processes. The NetWorker client software consists of client-side services and
user interface programs.
The server-side services and programs perform the following functions:
• Oversee backup and restore processes
• Maintain client configuration files
• Maintain an online client index
• Maintain an online media database
NetBackup, available from Veritas Software Corporation, focuses on allowing users to back up data
to disk and tape and to stage backups to disk for a period of time before moving them to
20
tape. This allows for faster data restores. NetBackup also features snapshot functionality which
enables non-disruptive upgrades. In addition, NetBackup users can perform full system (bare-metal)
restorations of data to drives that do not contain an operating system. NetBackup also synchronizes
laptop and desktop backup and restore operations with their server backups.
Backup Services
For most businesses, the standard recovery and backup solution hasn't changed for decades: tape.
Every day, businesses back up every server to tape and then physically move those tapes offsite to a
secure location for disaster protection. This increases the risk of human (tape mishandling) error.
Another risk involved with this popular backup method is for the backups themselves to fail. Often it
is not known that backups have failed until the data is needed for a recovery. To overcome these
risks, businesses are beginning to use a new type of service, online server backup and recovery.
21
There is currently a large selection of such service providers, but one good example is a popular
company called LiveVault. LiveVault provides its customers with continuous online backup,
recovery, and electronic vaulting (offsite storage). Companies that invest in such a service greatly
decrease the risk of failed or neglected backups or data loss due to an onsite disaster. Another
advantage of online backup and recovery is that because stored data does not reside directly on any
of a network's servers, server power is utilized for business applications, and network capacity is
released to the end user.
Scheduling Backups and Good Backup Practices
Creating a Backup Schedule
There are two main categories of data backups: full backup and incremental backup. A full backup is
a backup of every single file and folder within the source directory. The backup is therefore an exact
copy of the source directory. This backup takes up as much disk space as the original (maybe a little
less if compression is utilized).
22
An incremental backup is a backup of only the changed files - files that have been added or modified
since the last backup. Files that have been deleted since the last backup are also tracked. Incremental
backups are defined by dump levels. As mentioned in an earlier section of this report, the UNIX
dump command takes a dump level argument. The dump level is an integer in the range of 0 to 9. A
level 0 dump backs up the entire file system while all other levels backup only those files that have
been modified since the last dump of a level less than that level.
The more frequently backups are done, the smaller the amount of data that can potentially be lost.
Although it would seem that the simplest and safest solution to data backups would be to simply do a
full backup every night, some additional factors must first be taken into consideration. Backups take
time and personnel resources, and sometimes involve system disruption. Therefore, a company’s
backup schedule depends upon the need to minimize the number of tapes and the time available for
doing backups. Additionally, the time available to do a full restore of a damaged file system and the
time available for retrieving individual files that are accidentally deleted need to be considered.
If a company does not need to minimize the time and media spent on backups, it would be feasible to
do full backups every day. However, this is not realistic for most sites, so incremental backups are
used most often. An example of a moderate incremental backup schedule would be to back up
23
enough data to restore any files from any day or week from the last month. This requires at least four
sets of backup media – one set for each week. These volumes could then be reused each month. In
addition, each monthly backup would be archived for at least a year, with yearly backups being
maintained for some number of years. This would enable the restoration of files from some month
prior to the last month, at the expense of needing to restore from a tape which holds an entire
month’s worth of data. Similarly, data from some previous year could also be restored from one of
the yearly tapes. Table 2 shows this incremental backup schedule.
The numbers in Table 2 indicate the dump level used for that particular backup. All files that have
changed since the lower level backup at the end of the previous week are saved each day. For each
weekday level 9 backup, the previous Friday’s backup is the closest backup at a lower level.
Therefore, each weekday tape contains all the files changed since the end of the previous week (or
the since the initial level 0 if it is the first week). For each Friday backup, the nearest lower-level
backup is the previous Friday’s backup (or the initial level 0 if it is the first Friday of the month).
Therefore, each Friday's tape contains all the files changed during the week prior to that point. Please
note that the choice of dump levels is arbitrary. For example, dump levels of all 7 or all 8 could have
been used for the weekday backups. The choice of dump level relative to previous or subsequent
24
dump levels it what is important. A detailed explanation of backup scheduling is provided in Unix
Backup and Recovery (13).
Good Backup Practices
It is important to store data backups offsite, away from their source. Larry Ayoub, a senior executive
at Bank of America, said, “I think you have to accept that any data critical to the survival of a firm,
or which the loss of would result in considerable financial or legal exposure, must be sent offsite in
some manner, either physically or electronically” (2). Forty-five percent of companies leave their
server backup tapes onsite, vulnerable to natural calamities and security breaches, according to a
recent survey from Massachusetts-based business continuity company, AmeriVault Corporation (2).
Consider, for example, what would happen if data backups are stored in the back room of an office
space. If the whole building were destroyed by a file, all data would have been unrecoverable.
Care should be given to the choice of backup location as well. For example, what might have
happened if a company at the WTC had stored backups “offsite” by storing them on another floor in
the WTC building? Even though the collapse of the WTC was an unimaginable event, businesses
must prepare for the possibility of such events.
25
Finding a good way to store backups is almost as important as setting up a schedule to create them.
Backups should be stored in a place where only authorized people have access to them. A simple
solution is to create copies on disk drives or tapes daily and then move them to an offsite location
that is maintained by a data storage company. However, it can be difficult and expensive to move the
media offsite. The best solution for offsite data storage is to instantaneously transfer data over
network lines to a remote site. High security offsite backup services even mirror their data in offsite
locations.
Some additional considerations:
• Tapes should be labeled in a clear and consistent manner. In order to make restorations as
painless as possible, backups need to be easy to get to and well labeled. Labeling includes
clearly marking the tape itself as well as including a table of contents file so that individual
files on the tape can be found easily. In sites where several people share responsibility for
making backups or a number of different commands are used to create backups, the label
should also include the command used to create the backup. The label is also an ideal place
to keep a running tally of how many times the media has been used and how old it is.
26
• Backups must be tested regularly. Often, businesses have a good backup regimen with
automated backup software and reliable media, yet they seldom test restores of their data.
Backups can fail, and without testing the backups failure would not be detected until after a
crisis occurs.
• Design data for backups – keep filesystems to a size that is less than the backup media. This
will greatly simplify backups and thus reduce the risk of error.
Data Recovery
The reason that so much planning and diligence must be devoted to data backups is to facilitate data
recovery. Properly executed data backups will make the actual recovery of lost data the simplest task
of all. After determining which volumes contain the data that needs to be recovered, data is simply
27
recovered by using the native UNIX restore utility. The restore command is used to copy data from
the volume to a selected directory on a selected filesystem. Note that the Solaris version of restore is
actually ufsrestore. Details on the use of the restore command are provided in the UNIX System
Administration Handbook (10).
In some cases, a simple restore will not suffice. The data storage media may have physical damage.
There are only two major companies in the United States who specialize in recovery of data from
physical storage media. DriveSavers ([Link]) specializes in salvaging data damaged
by fire, floods or hard-disk crashes. This company also maintains a museum of bizarre disk disasters
on their website which is worth reading. Ontrack Data International ([Link]) also offers a
remote data recovery service for cases where the physical media is not destroyed.
Disaster Recovery Plan
For many companies, the most critical asset is their data. Implementing an effective data backup and
recovery system ensures the protection of this data in most circumstances. However, catastrophic
losses of entire systems (or worse, entire work sites) can and do happen. It is for this reason that
companies must prepare for the worst by developing a disaster recovery plan. Although data backups
28
and recovery are essential, they should not be thought of as disaster prevention, they should instead
be considered a critical component of the disaster recovery plan. When preparing the plan, some
considerations must be taken into account.
A risk assessment must first be conducted. The risk assessment will help to determine how much
data loss is acceptable. If it is not disastrous for a company to loose one day’s worth of data, then it
is not necessary to take data backups offsite every day. It is not desirable to spend too many
resources getting backups offsite unnecessarily. However, if the daily data is critical, plans must be
made for getting data offsite daily, or in some cases, more often.
Documentation of the disaster recovery plan must be created. In addition to outlining the steps for
recovering from a disaster, the documentation should provide contact information for software and
hardware vendors as well as primary and secondary personnel who are familiar with the disaster
recovery plan. Also, location of data backups should be identified. Because this document will guide
the reader through the recovery process, it is essential that this document, like the company’s data,
be backed up and stored safely offsite.
The final step to creating a disaster recovery plan is to test the plan. After the plan is in plan, it
should be tested with regular audits that are done by third party companies. For example, a
29
consultant could be hired – someone who is competent and knowledgeable but unfamiliar with the
system - to test the recovery system. This is necessary because those who are most familiar with the
plan may not be available to implement it after a disaster. It is important that other personnel be able
to understand and implement the plan.
The essence of data recovery
Data recovery means retrieving lost, deleted, unusable or inaccessible data that lost for various
reasons.
Data recovery not only restores lost files but also recovers corrupted data.
30
On the basis of different lost reason, we can adopt different data recovery methods. There are
software and hardware reasons that cause data loss, while we can recover data by software and
hardware ways.
Being different from prevention and backup, data recovery is the remedial measure. The best way to
insure the security of your data is prevention and backup regularly. To operate and use your data
according to the normative steps, you can reduce the danger of data loss to the lowest.
The scope of data recovery
There are so many forms and phenomenon on data problem, we can divide the objects or scope of
data recovery according to different symptoms.
System problem
31
The main symptom is that you cannot enter the system or the system is abnormal or computer closes
down. There are complex reasons for this, thus we need adopt different processing methods. Reasons
for this symptom may be the key file of system is lost or corrupted, there is some bad track on hard
disk, the hard disk is damaged, MBR or DBR is lost, or the CMOS setting is incorrect and so on.
Bad track of hard disk
There are logic and physical bad track. Logic bad track is mainly caused by incorrect operation, and
it can be restored by software. While physical bad track is caused by physical damage, which is real
damage, we can restore it by changing the partition or sector. When there is physical bad track,
you’d better backup your data for fear that the data can not be used any more because of the bad
track.
Partition problem
If partition can not be identified and accessed, or partition is identified as unformatted, partition
recovery tools such as Partition Table Doctor can be used to recover data.
Files loss
32
If files are lost because of deletion, format or Ghost clone error, files restoring tools such as Data
Recovery Wizard can be used to recover data.
Password Loss
If files, system password, database or account is lost, some special decryption tools that correspond
to certain data form such as Word, Winzip can be used.
Files repair
For some reasons, some files can not be accessed or used, or the contents are full of troubled
characters, the contents are changed so as they can not be read. In this condition, some special files
restoring tools can be tried to restore the files.
The principle of data recovery
Data recovery is a process of finding and recovering data, in which there may be some risk, for no
all situations can be anticipated or prearranged. It means maybe there will be some unexpected
things happen. So you need reduce the danger in data recovery to the lowest:
33
1. Backup all the data in your hard disk
2. Prevent the equipment from being damaged again
3. Don’t write anything to the device on which you want to recover data
4. Try to get detailed information on how the data lost and the losing process
5. Backup the data recovered in time.
Data Protecting Technologies
Data security and fault freedom of storage are paid more and more attention. People are attaching
more and more importance to developing new technologies to protect data.
34
[Link] Technology
SMART, also called Self-Monitoring Analysis and Report Technology, mainly protects HD from
losing data when there is some problems on the HD. SMART drive can reduce the risk of data loss,
it alarms to predict and remind thus enhancing the data security.
[Link]
Shake Protecting System, can prevent the head from shaking thus enhancing the anti-knock
characteristics of HD, avoiding damages caused by shake.
[Link]
DFT, a kind of IBM data protecting technology, can check hard disk via using DFT program to
access the DFT micro codes in hard disk. By DFT, users can conveniently check the HD operation.
35
[Link] disk array technology
Originally ‘Redundant Arrays of Inexpensive Disks’. A project at the computer science department
of the University of California at Berkeley, under the direction of Professor Katz, in conjunction
with Professor John Ousterhout and Professor David Patterson.
The project is reaching its culmination with the implementation of a prototype disk array file server
with a capacity of 40 GBytes and a sustained bandwidth of 80 MBytes/second. The server is being
interfaced to a 1 Gb/s local area network. A new initiative, which is part of the Sequoia 2000 Project,
seeks to construct a geographically distributed storage system spanning disk arrays and automated
libraries of optical disks and tapes. The project will extend the interleaved storage techniques so
successfully applied to disks to tertiary storage devices. A key element of the research will be to
develop techniques for managing latency in the I/O and network paths.
The original (‘Inexpensive’) term referred to the 3.5 and 5.25 inch disks used for the first RAID
system but no longer applies.
The following standard RAID specifications exist:
RAID 0 Non-redundant striped array
RAID 1 Mirrored arrays
36
RAID 2 Parallel array with ECC
RAID 3 Parallel array with parity
RAID 4 Striped array with parity
RAID 5 Striped array with rotating parity
The basic idea of RAID (Redundant Array of Independent Disks) is to combine multiple inexpensive
disk drives into an array of disk drives to obtain performance, capacity and reliability that exceeds
that of a single large drive. The array of drives appears to the host computer as a single logical drive.
The Mean Time Between Failure (MTBF) of the array is equal to the MTBF of an individual drive,
divided by the number of drives in the array. Because of this, the MTBF of a non-redundant array
(RAID 0) is too low for mission-critical systems. However, disk arrays can be made fault-tolerant by
redundantly storing information in various ways.
[Link]
SAN, called Storage Area Network or Network behind servers, is specialized, high speed network
attaching servers and storage devices. A SAN allows "any to any" connection across the network,
using interconnect elements such as routers, gateways, hubs and swithes. It eliminates the traditional
37
dedicated connection between a server and storage, and concept that the server effectively "owns and
manages" the storage devices. It also eliminates any restriction to amount of data that a server can
access, currently limited by the number of storage devices, which can be attached to the individual
server. Instead, a SAN introduces the flexibility of networking to enable one server or many
heterogeneous servers to share a common storage "utility", which may comprise many storage
devices, including disk, tape, and optical storage. And, the storage utility may be located far from the
servers which use it.
[Link]
NAS is Network Attached Storage. It can store the quick-increased information
.Backup means to prepare a spare copy of a file, file system, or other resource for use in the event of
failure or loss of the original. This essential precaution is neglected by most new computer users
until the first time they experience a disk crash or accidentally delete the only copy of the file they
have been working on for the last six months. Ideally the backup copies should be kept at a different
site or in a fire safe since, though your hardware may be insured against fire, the data on it is almost
certainly neither insured nor easily replaced.
38
[Link]
Backup in time may reduce the danger and disaster to the lowest, thus data security can be most
ensured. In different situations, there are different ways. Both backing up important data of system
with hardware and backing up key information with cloning mirror data to different storage device
can work well.
Main technical specification and parameter of hard disk
Capacity
39
We can see the capacity in two aspects: the total capacity and the capacity of one disk. The whole
capacity is made up of each disk capacity.
If we increase the disk capacity, we would not only improve the disk capacity, improve the speed of
transmission, but also cut the cost down.
Rotate speed.
Rotate speed is the speed disk rotate. It is measured by RPM (Round Per Minute).The rotate speed of
IDE hard disk are 5400RPM, 7200RPM etc.
Average Seek Time
The average seek time gives a good measure of the speed of the drive in a multi-user environment
where successive read/write request are largely uncorrelated.
Ten ms is common for a hard disk and 200 ms for an eight-speed CD-ROM.
Average Latency
The hard disk platters are spinning around at high speed, and the spin speed is not synchronized to
the process that moves the read/write heads to the correct cylinder on a random access on the hard
disk. Therefore, at the time that the heads arrive at the correct cylinder, the actual sector that is
needed may be anywhere. After the actuator assembly has completed its seek to the correct track, the
40
drive must wait for the correct sector to come around to where the read/write heads are located. This
time is called latency. Latency is directly related to the spindle speed of the drive and such is
influenced solely by the drive's spindle characteristics. This operation page discussing spindle speeds
also contains information relevant to latency.
Conceptually, latency is rather simple to understand; it is also easy to calculate. The faster the disk is
spinning, the quicker the correct sector will rotate under the heads, and the lower latency will be.
Sometimes the sector will be at just the right spot when the seek is completed, and the latency for
that access will be close to zero. Sometimes the needed sector will have just passed the head and in
this "worst case", a full rotation will be needed before the sector can be read. On average, latency
will be half the time it takes for a full rotation of the disk.
Average Access Time
Access time is the metric that represents the composite of all the other specifications reflecting
random performance positioning in the hard disk. As such, it is the best figure for assessing overall
positioning performance, and you'd expect it to be the specification most used by hard disk
41
manufacturers and enthusiasts alike. Depending on your level of cynicism then, you will either be
very surprised or not surprised much at all, to learn that it is rarely even discussed. Ironically, in the
world of CD-ROMs and other optical storage it is the figure that is universally used for comparing
positioning speed. I am really not sure why this discrepancy exists.
Perhaps the problem is that access time is really a derived figure, comprised of the other positioning
performance specifications. The most common definition is:
Access Time = Command Overhead Time + Seek Time + Settle Time + Latency
The speed with which data can be transmitted from one device to another. Data rates are often
measured in megabits (million bits) or megabytes (million bytes) per second. These are usually
abbreviated as Mbps and MBps, respectively.
Buffer Size(Cache)
A small fast memory holding recently accessed data, designed to speed up subsequent access to the
same data. Most often applied to processor-memory access but also used for a local copy of data
accessible over a network etc.
42
When data is read from, or written to, main memory a copy is also saved in the cache, along with the
associated main memory address. The cache monitors addresses of subsequent reads to see if the
required data is already in the cache. If it is (a cache hit) then it is returned immediately and the main
memory read is aborted (or not started). If the data is not cached (a cache miss) then it is fetched
from main memory and also saved in the cache.
The cache is built from faster memory chips than main memory so a cache hit takes much less time
to complete than a normal memory access. The cache may be located on the same integrated circuit
as the CPU, in order to further reduce the access time. In this case it is often known as primary cache
since there may be a larger, slower secondary cache outside the CPU chip.
The most important characteristic of a cache is its hit rate - the fraction of all memory accesses
which are satisfied from the cache. This in turn depends on the cache design but mostly on its size
relative to the main memory. The size is limited by the cost of fast memory chips.
Conclusion
Protection of a company’s critical data is not only essential to its continued daily operations, it is
also necessary for the survival of the company. Developing a strong data backup and recovery
system is therefore essential. However, a data backup and recovery system is not the only element
43
needed to protect a company’s data. Disaster could strike a company’s main systems at any time, and
they need to be prepared to deal with it. That is why a comprehensive disaster recovery plan should
be developed for any company that has data to protect.
References
1. Surette T., Genn V., “Special Report - Disaster Recovery & Systems Backup.” Australian
Banking & Finance, Feb 28, 2002 v11 i3 p15.
44
2. Murray, Louise. “Offsite storage: Sending your data away.” SC Online Magazine, September
2003, [Link]
3. Kovar, Joseph F. “Precarious position” CRN. Jericho, Sep 22, 2003, Iss. 1063; pg. 4A. Accessed
online through ABI/INFORM Global Database, [Link]
4. Mearian, Lucas. “Disk arrays gain in use for secondary storage: but tapes continue to handle most
data for backups and archiving, survey finds.” Computerworld, April 28, 2003 v37 i17 p12(1).
Accessed online through ABI/INFORM Global Database, [Link]
5. IBM Corporation, Hard Disk Drives (HDD) Web Page,
[Link] accessed November 3, 2003.
6. Indiana University, Storing Backups and Media Life Expectancy Web Page,
[Link] accessed November 3, 2003.
7. Castlewood Corporation, Home Page, [Link] accessed November 7, 2003.
8. Webopedia Online Dictionary for Computer and Internet Terms, [Link]
copyright 2003.
9. Quantum Corporation, DLTtape Home Page, [Link] copyright 2003.
45
10. E. Nemeth, G. Snyder, S. Seebass, T.R. Hein. UNIX System Administration Handbook, Prentice
Hall, 2001.
11. University of Maryland, AMANDA Web Page, [Link] updated July 30, 2003.
12. Eric Melski, B.U.R.T. Backup and Recovery Tool Web Page,
[Link] updated October 15, 1998.
46