100% found this document useful (1 vote)

1K views46 pages

DATA RECOVERY (kHATEEB)

Q: Evaluate the risks associated with traditional tape-based backup methods compared to online server backup services.

Traditional tape-based backup methods pose several risks, such as potential human error in tape handling and a lack of immediate verification of backup integrity, leading to failures not evident until data is needed. Additionally, these methods often store backups onsite, increasing the risk of data loss from physical disasters. In contrast, online server backup services mitigate these risks by performing continuous and automated backups, ensuring backup integrity through verification processes, and storing data offsite, thus providing disaster protection. The use of offsite electronic vaulting reduces the risk of data loss due to site-specific disasters and enhances data accessibility and reliability .

Q: Discuss the advantages and limitations of RAID technologies in ensuring data reliability and performance.

RAID technologies provide significant advantages in terms of data reliability and performance by combining multiple disk drives to appear as a single logical drive to the host, thus delivering higher performance, capacity, and reliability. They offer redundancy through mirroring (RAID 1) or parity (RAID 5), which protects against data loss in the event of drive failure. However, RAID 0, with no redundancy, presents a reliability risk, as failure in any single drive affects the entire array. RAID implementations can also add cost and complexity to system management. Additionally, while RAID protects against disk failure, it does not protect against data corruption or loss due to human or application errors .

Q: Explain the role of technologies like SMART and SPS in enhancing hard drive data security and reliability.

SMART (Self-Monitoring Analysis and Report Technology) enhances hard drive security by monitoring drive health and predicting failures, thus allowing preventive measures before actual data loss occurs. It provides early warnings, helping users to back up data and replace drives proactively. The SPS (Shake Protecting System) enhances reliability by preventing the disk head from shaking, improving resistance to physical shock, and minimizing data corruption or loss due to impact, thus extending drive life and maintaining data integrity .

Q: How does the use of automated backup systems address the challenges of data backup in environments with decreasing human resources?

Automated backup systems, such as stackers, autoloaders, and tape libraries, significantly address the challenges posed by decreasing human resources by performing tasks that typically require human intervention. These systems automate the process of loading, unloading, and swapping media, thus reducing the need for manual handling and the associated human error. They also increase reliability and consistency in backups by allowing for multi-volume media devices, which enhance storage capacity beyond what standalone tape drives can offer. Additionally, these systems support larger scale operations without additional manpower, thus adapting to the increasing data demand in environments with limited human resources .

Q: Compare the cost-effectiveness of using DLT cartridges versus floppy disks for data storage in terms of cost per GB and reliability.

DLT cartridges are more cost-effective than floppy disks when considering cost per GB and reliability. Although floppy disks are cheaper per unit, their small capacity results in a high cost per GB. In contrast, DLT cartridges and drives, despite being more expensive initially, provide a much lower cost per GB due to their higher capacity and speed, which reduces the quantity of media needed. Furthermore, floppy disks are less reliable and slower compared to DLT, which requires sequential access, but offers greater reliability due to its robust design for larger data volumes .

Q: What are the key differences between full backups and incremental backups, and how do they impact data recovery strategies?

Full backups and incremental backups differ mainly in scope and resource usage. A full backup involves copying every single file and folder, creating an exact replica of the source directory. This method is time-consuming and requires significant storage space but simplifies recovery as all needed files are in one place. Incremental backups, however, only save files that have been added or modified since the last backup, saving time and storage space. However, data recovery with incremental backups can be more complex, requiring that previous incremental backups be applied in order. Therefore, a strategy using full backups for infrequent recovery and incremental backups for regular updates is usually more efficient .

Q: What are the potential consequences of neglecting backup verification and testing, and how can such outcomes be mitigated?

Neglecting backup verification and testing can result in severe consequences, such as discovering that backups are unusable during a data recovery situation, potentially leading to data loss. Without regular testing, issues like incomplete backups, corruption, or hardware failures may go unnoticed. Mitigation involves implementing a routine schedule for verifying and testing backups to ensure they are complete and restorable. This process involves checking the integrity of backup media and executing trial restores to verify data accessibility and reliability, thus reducing the risk of failure during an actual recovery scenario .

Q: Analyze how Storage Area Networks (SAN) transform traditional server-storage relationships and their implications for data access.

Storage Area Networks (SAN) revolutionize traditional server-storage relationships by decoupling the server from direct disk management and allowing any-to-any connectivity over a network. This transformation allows multiple servers to share common storage resources, enhancing flexibility and data availability. Unlike traditional architectures, where servers 'own' storage devices, SANs centralize data management, improving scalability and reducing data access bottlenecks. This setup enables efficient resource utilization, greater data consolidation, and more robust disaster recovery options, as storage resources can be physically remote from the servers accessing them .

Q: How do data recovery strategies differ based on the causes of data loss, such as system problems, partition issues, or physical damage?

Data recovery strategies vary significantly based on the cause of data loss. For system problems, where the system fails to boot or behaves abnormally, recovery might involve restoring key system files or correcting settings like MBR, which requires specific software-based solutions. Partition issues, such as unidentified or unformatted partitions, involve using recovery tools like Partition Table Doctor to restore partition structures. Physical damage entails different approaches; logical bad tracks might be fixed with software, whereas physical damage often requires manipulation of drive sectors or hardware repair. Effective data recovery demands a tailored approach matching the specific failure symptoms and underlying causes .

Q: How do SANs and NAS systems differ in terms of architecture and their suitability for different data management scenarios?

SANs and NAS systems differ primarily in their architecture and data handling. SANs are high-speed networks that integrate directly with server access, allowing block-level data transfers, which makes them suitable for high-performance applications requiring fast access to large data sets. In contrast, NAS provides file-level data access over standard network protocols, best suited for ease-of-use and file-sharing scenarios. NAS is typically more cost-effective for smaller companies or departments needing simple file services, while SANs cater to larger enterprises requiring robust data processing and storage capacity, offering greater flexibility in data management and scalability .

Uploaded by

NATIONAL XEROX

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

1K views46 pages

DATA RECOVERY (kHATEEB)

Uploaded by

NATIONAL XEROX

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOC, PDF, TXT or read online on Scribd

DATA RECOVERY

A SEMINAR REPORT

Submitted by

KHATEEB AHMAD

in partial fulfillment for the award of the degree

Bachelor of Computer Application

DEPARTMENT OF COMPUTER APPLICATION

INTEGRAL UNIVERSITY LUCKNOW

MAY 2017
1
INTEGRAL UNIVERSITY
LUCKNOW

CANDIDATE’S DECLARATION
I hereby certify that the work which is being presented in the Seminar Report of “Data Recovery” in the
partial fulfilment of the requirements for the award of the Degree of Bachelor of Computer Application
and submitted in the Department of Computer Application, Integral University, Lucknow is an authentic
record of my own work carried out during a period from January 2017 to May2017, under the guidance
of Dr. Tasneem Ahmed, Assistant Professor and Mrs. Nashra Javed, Assistant Professor, Department of
Computer Application, Integral University, Lucknow.
The matter presented in the lab report has not been submitted by me for the award of any other
degree of this or any other University.

(Khateeb Ahmad)

This is to certify that the above statement made by the candidate is correct to the best of our
knowledge.
Dated:

(Dr. Tasneem Ahmed) (Mrs. Nashra Javed)

Assistant Professor Assistant Professor

This is to certify that the above statement made by the candidate is correct to the best of my
knowledge

2
Head of the Department

ACKNOWLEDGEMENT
It gives me great pleasure to present before you Seminar Lab report strictly under the guidance of Dr.
Tasneem Ahmed, Assistant Professor and Mrs. Nashra Javed Assistant Professor, Department of
Computer Application.
They have made sincere efforts to make the report more meaningful complete compact and
comprehensive. It’s a great pleasure to let you know that I have put my feelings into practice.
At last I give my special thanks to our batch mates for all the valuable suggestions without which this lab
report could not be completed.

-Khateeb Ahmad

3
TABLE OF CONTENTS

TITLE PAGE NO.

1. INRODUCTION 4
2. DATA BACKUP MEDIA AND HARDWARE 6
 HARD DRIVE 7
 MAGNETIC TAPES 9
 OTHER MEDIA 11
3. STACKERS,AUTOLOADERS,AND TAPE LIBRARIES 15
4. TOOLS FOR BACKUPS AND BACKUP SERVICE 16
 FREE TOOLS 17
 COMMERCIAL TOOLS 19
 BACKUP SERVICES 23
5. SCHEDULING BACKUPS AND GOOD BACKUP PRACTICES 24
 CREATING A BACKUP SCHEDULE 24
 GOOD BACKUP PRACTICES 26
6. DATA RECOVERY 28
7. DISASTER RECOVERY PLAN 30
8. THE ESSENCE OF DATA RECOVERY 32
9. THE SCOPE OF DATA RECOVERY 33
 SYSTEM PROBLEM 33
 BAD TRACK OF HARD DISK 34
 PARTITION PROBLEM 34
 FILES LOSS 34
 PASSWORD LOSS 35
 FILES REPAIR 35
10. THE PRINCIPLE OF DATA RECOVERY 36
11. DATA PROTECTING TECHNOLOGIES 37
 SMART TECHNOLOGY 37
 SPS 37
 DFT 37
 FLOPPY DISK AND ARRAY TECHNOLOGY 38
 SAN 40
12. TECHNICAL SPECIFICATION 42
4
13. CONCLUSION 46
14. REFERENCE 48

Introduction

The collapse of the World Trade Center on September 11, 2001 reinforces the importance of backing up

critical data, protecting the backups, and planning for disastrous data losses. It is estimated that the cost to

replace technology and systems lost in the World Trade Center (WTC) disaster could be $3.2 billion .

However, some companies that occupied the WTC, such as Morgan Stanley, were able to quickly

recover. The financial giant was able to start running again because in addition to having the usual data

backups that most companies keep on site, it also maintained real-time copies of its data in a second

location miles away. All transactions occurring on the company WTC servers and mainframes were

continuously transferred through high-speed telecommunications lines to computers in Teaneck, New

Jersey.

An event as unimaginable as the WTC collapse is not the only way to have a data loss disaster. There are

countless ways to loose critical data. Human error is among the leading causes of data loss. For example,

mistakes as simple as typing “rm *” at the UNIX command prompt can have disastrous results. In

5
addition, software failure, hardware failure, computer viruses, malicious or disgruntled employees, and

natural disasters such as fire and flood can cause a catastrophic system failure. The Disaster Recovery

Institute International estimates that 90 percent of companies that experience a significant data loss are

out of business within three years .

Making backups of all data seems like the obvious solution. However, many small companies have

inadequate backup practices or technology. The problem could be insufficient storage capacity, an

inability to use backup solutions, a lack of backup testing, no offsite data storage, or inconsistent backup

procedures. Unfortunately, backing up data is simply given a very low priority at a number of small

firms.

According to a survey conducted by storage media vendor Imation, 30% of small businesses lack formal

data backup and storage procedures or do not implement those practices consistently. In fact, thirty-nine

percent of the small firms surveyed admitted that they review their storage procedures only after a

problem occurs. In addition, more than one third of the respondents said they do a fair or poor job of

storing backup data offsite, and over half rate their disaster recovery plan as fair or poor .

It is very difficult and in fact sometimes impossible to function normally during a crisis. It if for this

reason that it is important to think about data backups before a disaster strikes. This paper provides some

6
guidance for developing a data backup plan by summarizing data backup media and hardware

technologies, data backup procedures and services, and data recovery services. It also provides an outline

for disaster recovery planning.

Data Backup Media and Hardware

One of the first decisions to make when preparing a data backup plan is to decide what physical medium

will be used to store the data backups. The speed, capacity, cost, and life expectancy of the medium are

all considerations that should be taken into account when making this decision.

In terms of backup media life expectancy, a number of factors should be considered. For long term

storage situations, the media may become obsolete due to changing technologies before it physically

degrades. The information on this media would therefore become unusable. Similarly, the life expectancy

of the media may be much longer than the amount of time it takes for the information on the media to

degrade. Due to these considerations, care must be given when choosing media based on it’s life

expectancy.

Hard Drives

7
Data storage onto hard disk media is becoming more and more prevalent in corporate data centers,

according to a survey of more than 1,000 information technology managers conducted by Peripheral

Concepts, Inc. .

Hard drives (magnetic disks) have a very high data capacity, currently holding over 100GB of data (5). A

typical hard drive consists of platters that are coated with magnetic film. These platters spin while data is

accessed by small heads situated on drive arms. This geometry enables data to be accessed randomly, and

thus very quickly. In addition to a high storage capacity and speedy access, magnetic disks are estimated

to have an expected life span of 5-10 years . Although hard disks used to be the most expensive backup

media, prices have dropped exponentially in the last few years. Removable hard disks are becoming even

more affordable, and have capacities of over 2GB. The Orb Drive, by Castlewood Corporation, is an

example of such a product .

Hard drives can be used for data backups by mirroring. Disk mirroring is a technique in which data is

written to two duplicate disks simultaneously. If one of the disks fails, the system can quickly switch to

the other disk without any loss of data or service. Mirroring is commonly used for systems such as

Internet databases, where it is critical that data be accessible at all times. However, there is a problem

with this technique: if both disks are a part of the same machine and the disk controller (or the whole

8
machine) fails, neither disk would be accessible. One possible solution to this problem is to implement a

mirroring scheme that involves multiple machines. The backup machine duplicates all of the processes of

the primary machine. This is effective because if the primary machine fails, the backup machine can

immediately take its place without any down time. Although this is a good solution for the problem of a

failed machine, the possibility still exists for loss of both machines, for example due to fire. To prevent

this situation, some companies have network mirroring. These companies mirror their main disks with

disks in remote locations via a network connection. However, this type of mirroring is expensive. Each

machine must be mirrored by an identical machine whose only purpose is to be there in the event of a

failure.

Of course, mirroring does not provide total protection against data loss. If a computer virus destroys files

or files are accidentally deleted, the mirrored files will also be destroyed or deleted. Having a previously

stored copy of the data is important, therefore traditional data backup media will still be required. The

Peripheral Concepts survey also shows that a large majority of data is still backed up and archived the

traditional way: on tape.

Magnetic tapes

9
Magnetic tape is the most realistic and traditional medium for creating backups. The tape is actually a

Mylar film strip on which information is magnetically stored. Because magnetic tapes

are a sequential storage device (tape drives cannot randomly access data like other storage devices, such

as disk drives), they are much slower. However, high storage capacity and low cost make magnetic tapes

the storage medium of choice for archiving large amounts of data. Helical scan devices are also magnetic

tapes, but the data heads spin at an angle to the strip of tape, thus creating denser data storage and higher

capacity.

Life expectancy and the number of times a tape can be reused depends not only on the quality of the tape

itself, but also the environment in which it is stored and the quality and maintenance of the tape drive

heads. An estimate of magnetic tape life expectancy is 1 year (6).

QIC (quarter inch cartridge, pronounced “quick”) is a technology standard for magnetic tapes developed

by a consortium of manufacturers called the Quarter-Inch Cartridge Drive Standards, Inc. (8). Travan

tapes, developed by 3M Corporation, are a high density form of QIC standard tapes. Travan tapes were

widely used by companies several years ago, but are now often used for personal computer backup Also

called floppy tape because they can use a PC’s floppy disk controller instead of requiring their own

10
special controller, the drives are inexpensive and reliable. The current maximum storage capacity of

Travan tapes is up to 10GB, but they are relatively slow.

DAT (digital audio tape) come in two standard sizes, 8mm and 4mm. 4mm DAT's are helical scan

devices and therefore can support storage capacities up to 20GB. 8mm DAT's have storage capacities of

only about 7GB. The 4mm tapes have a great advantage over other tape media; they are physically the

smallest and therefore take up less storage room. A disadvantage of these tapes is that they are very

sensitive to heat damage, thus complicating the selection of a storage location. DAT tapes come in two

formats. One format is for recording video or audio, the other is for binary data. The video/audio tapes

work for making data backups, but they are less reliable than the binary format in terms of retaining data.

The 4mm DAT is currently the most widely used tape type, but it is being replaced by digital linear tapes

(DLT).

DLT tapes have a storage capacity of up to 40GB. The drives are quite fast and are the newest standard

backup media technology. Quantum Corporation ([Link]), the manufacturer of the Super

DLTtape II, claims a shelf life of 30 years on their product. Besides this unbeatable life expectancy, the

DLT tape has another advantage. Like 4mm DAT’s, DLT’s are small. The DLT dimensions are

11
approximately 1" X 4" X 4" and they weigh only about 8 ounces (9). DLT’s are currently a very popular

medium, even though they are still relatively expensive.

Other Media

Optical disks, such as recordable CD-RW’s have a much longer lifespan than do tapes (except for

DLT’s). The estimated lifespan of optical media is greater than 30 years (6). However, optical disks have

a smaller data capacity than tapes, and they are more expensive per GB. Floppy disks are the least

expensive media, but because they have such a small capacity, a huge number of them are needed to back

up even moderate amounts of data, thus making the price per GB very high. Table 1 summarizes the

media types discussed above as well as several others.

12
Table 1. Capacitya Speeda Drive Media Cost/GB Reuse? Random

Compariso

n of

backup

media (10)

Medium
Floppy disk 2.8MB < 100 KB/s $15 25¢ $91.43 Yes Yes
SuperDisk 120MB 1.1 MB/sb $200 $8 $68.27 Yes Yes
Zip 250 250MB 900 KB/s $200 $15 $61.44 Yes Yes
CS-R 650MB 2.4 MB/s $200 75¢ $1.18 No Yes
DC-RW 650MB 2.4 MB/s $200 $2 $3.15 Yes Yes
Jaz 2GB 7.4 MB/s $350 $100 $50.00 Yes Yes
Orb 2.2GB 12.2 MB/sb $200 $40 $18.18 Yes Yes
Exabyte 7GB 1 MB/s $1,200 $8 $1.14 Yes No

(8mm)
Travan 10GB 1 MB/s $200 $34 $3.40 Yes No
DDS-4 20GB 2.5 MB/s $1,000 $30 $1.50 Yes No

(4mm)
ADR 25GB 2 MB/s $700 $40 $1.60 Yes No
DLT (1/2 40GB 6 MB/s $4,000 $60 $1.50 Yes No

in.)
AIT-2 50GB 6 MB/s $3,500 $95 $1.90 Yes No

(8mm)
Mammoth-2 60GB 12 MB/s $3,500 $80 $1.33 Yes No
a. Uncompressed capacity and speed

b. Maximum burst transfer rate; the manufacturer does not disclose the true average throughput.

Table 1 illustrates the problem with using inexpensive yet low capacity media such as floppy disks.

Compare the cost per GB of floppy disks to DLT. Even though floppies and floppy drives are much

13
less expensive than DLT cartridges and drives, they have such a small capacity that many more are

needed to store data, thus resulting in a high cost per GB. The table also illustrates the differences in

access speeds of the various storage media. For example, compare the speed of the Orb disk with the

DLT speed. Orb is much faster because data is accessed randomly on disks, while tapes such as DLT

require sequential access.

Stackers, Autoloaders, and Tape Libraries

As technology progresses and more work is becoming automated on a global scale, more and more

data is being generated. Unfortunately, a decrease in human resources is happening concurrently.

The result is that fewer people are available to handle data backups. What’s more, due to this

increase in data, stand-alone tape drives are often not sufficient in capacity to even backup mid-sized

networks. A good solution to these problems is to automate backups. In addition to reducing the

need for the manual handling of backup media, automated backups involve multi-volume media

devices, thus greatly increasing storage capacity. Automation also makes backups reliable and

consistent.

14
Backup automation combines robotics with backup media and software to produce a device that can

load, unload, and swap media without operator intervention. Stackers are tape changers that allow

the operator to load a hopper with tapes. Tapes are inserted and removed in sequential order by the

stacker’s robotic mechanism. For a stacker to backup a filesystem, it would begin with the first tape

and continue automatically inserting and removing tapes until the backup was complete or until it

ran out of available cartridges. Autoloaders have the added functionality of being able to provide

any of their tapes upon request. Libraries are similar to autoloaders, but have the added ability to

support larger scale backups, user initiated file recovery and simultaneous support of multiple users

and hosts. Libraries are larger and more complex than stackers or autoloaders. As a result they are

more expensive and tend to be used by larger scale companies.

15
Tools for Backups and Backup Services

Dump is the native UNIX utility that archives files to tapes. It is the most common way to create

backups. Dump builds a list of files that have been modified since last dump, archives these files into

a single file, and stores this file on an external device. Filesystems must be dumped individually.

Dump works only on local machine, not over a network, so the dump command must be issued on

each machine that is to be backed up.

Note that the Solaris operating system’s version of dump is not quite the same as other UNIX

systems. In Solaris, the command ufsdump is equivalent to dump. Dump takes as an argument an

16
integer value that represents a dump level. This is related to scheduling backups and is described in

the next section, Scheduling Backup and Good Backup Practices.

Free Tools

AMANDA (The Advanced Maryland Automatic Network Disk Archiver) is a public domain utility

developed by the University of Maryland. It was designed to backup many computers in a network

onto a single server’s high capacity tape drive. It also works with multiple stackers, allowing for a

great increase in backup data capacity. AMANDA uses the native UNIX dump utility and does it's

own dump level scheduling, given general information by the user about how much redundancy is

desired in the data backups. AMANDA is one of the most popular free backup systems, and has a

large user community. Based on the membership of AMANDA-related mailing lists, there are

probably well over 1,500 sites using it (11). The UNIX System Administration Handbook (10)

provides an abbreviated yet comprehensive walk-through of the AMANDA utility.

BURT is a backup and recovery tool designed to perform backups to, and recoveries from, tapes.

BURT is based on Tcl/Tk 8.0 scripts, and because of this it is very portable. It can backup multiple

system platforms (12).

17
The native UNIX dump utility can be automated with a shell script called [Link].

[Link] enables the user to ensure that dump performed properly by checking return codes. It

also provides an intelligent way to choose which filesystems to backup and creates a table of

contents for each backup.

Star is an implementation of the UNIX tar utility. It is the fastest known implementation of tar, with

speeds exceeding 14MB/s (13). This is more than twice as fast as a simple dump. Another nice

feature of Star is that it does not clobber files. More recent copies of files already on disk will not be

overwritten by files from the backup medium during a restore. Star is available via anonymous ftp at

[Link]

Afbackup is a utility that was written and is maintained by Albert Flugel. It is a client/server backup

system that allows many workstations to backup to a central server, either simultaneously or

sequentially. The advantage of afbackup over a simple dump is that backups can be started remotely

from the server or by cron scheduling on each of the clients.

Bacula is a network based client/server backup program. It is a set of computer programs that

manage backup, recovery, and verification of computer data across a network of different types of

18
computer systems. Bacula is efficient and relatively easy to use, and offers many advanced storage

management features that make it easy to find and recover lost or damaged files.

Commercial Tools

Makers of backup and recovery software booked $2.7 billion in revenues in 2001, and that figure is

expected to grow to $4.7 billion in 2005, according to research firm IDC (14). This reflects the

popularity of commercially available software for data backups. Legato and Veritas are currently

two of the most popular backup software companies. NetWorker from Legato is a commercial

package which allows storage devices to be placed throughout the network as NetWorker nodes.

These nodes can then be managed during backups as if they are locally attached devices. The

following is an excerpt from the Legato NetWorker Administrator’s Guide, published on the website

of Sun Microsystems (15):

With NetWorker, you can:

• Perform automated “lights out” backups during non peak hours

• Administer, configure, monitor, and control NetWorker functions from any system on a

network

• Centralize and automate data management tasks

19
• Increase backup performance by simultaneously sending more than one savestream to

the same device

• Optimize performance using parallel savestreams to a single device, or to multiple

devices or storage nodes

NetWorker client/server technology uses the network protocol Remote

Procedure Call (RPC) to back up data. The NetWorker server software consists

of several server-side services and programs that oversee backup and recover

processes. The NetWorker client software consists of client-side services and

user interface programs.

The server-side services and programs perform the following functions:

• Oversee backup and restore processes

• Maintain client configuration files

• Maintain an online client index

• Maintain an online media database

NetBackup, available from Veritas Software Corporation, focuses on allowing users to back up data

to disk and tape and to stage backups to disk for a period of time before moving them to

20
tape. This allows for faster data restores. NetBackup also features snapshot functionality which

enables non-disruptive upgrades. In addition, NetBackup users can perform full system (bare-metal)

restorations of data to drives that do not contain an operating system. NetBackup also synchronizes

laptop and desktop backup and restore operations with their server backups.

Backup Services

For most businesses, the standard recovery and backup solution hasn't changed for decades: tape.

Every day, businesses back up every server to tape and then physically move those tapes offsite to a

secure location for disaster protection. This increases the risk of human (tape mishandling) error.

Another risk involved with this popular backup method is for the backups themselves to fail. Often it

is not known that backups have failed until the data is needed for a recovery. To overcome these

risks, businesses are beginning to use a new type of service, online server backup and recovery.

21
There is currently a large selection of such service providers, but one good example is a popular

company called LiveVault. LiveVault provides its customers with continuous online backup,

recovery, and electronic vaulting (offsite storage). Companies that invest in such a service greatly

decrease the risk of failed or neglected backups or data loss due to an onsite disaster. Another

advantage of online backup and recovery is that because stored data does not reside directly on any

of a network's servers, server power is utilized for business applications, and network capacity is

released to the end user.

Scheduling Backups and Good Backup Practices

Creating a Backup Schedule

There are two main categories of data backups: full backup and incremental backup. A full backup is

a backup of every single file and folder within the source directory. The backup is therefore an exact

copy of the source directory. This backup takes up as much disk space as the original (maybe a little

less if compression is utilized).

22
An incremental backup is a backup of only the changed files - files that have been added or modified

since the last backup. Files that have been deleted since the last backup are also tracked. Incremental

backups are defined by dump levels. As mentioned in an earlier section of this report, the UNIX

dump command takes a dump level argument. The dump level is an integer in the range of 0 to 9. A

level 0 dump backs up the entire file system while all other levels backup only those files that have

been modified since the last dump of a level less than that level.

The more frequently backups are done, the smaller the amount of data that can potentially be lost.

Although it would seem that the simplest and safest solution to data backups would be to simply do a

full backup every night, some additional factors must first be taken into consideration. Backups take

time and personnel resources, and sometimes involve system disruption. Therefore, a company’s

backup schedule depends upon the need to minimize the number of tapes and the time available for

doing backups. Additionally, the time available to do a full restore of a damaged file system and the

time available for retrieving individual files that are accidentally deleted need to be considered.

If a company does not need to minimize the time and media spent on backups, it would be feasible to

do full backups every day. However, this is not realistic for most sites, so incremental backups are

used most often. An example of a moderate incremental backup schedule would be to back up

23
enough data to restore any files from any day or week from the last month. This requires at least four

sets of backup media – one set for each week. These volumes could then be reused each month. In

addition, each monthly backup would be archived for at least a year, with yearly backups being

maintained for some number of years. This would enable the restoration of files from some month

prior to the last month, at the expense of needing to restore from a tape which holds an entire

month’s worth of data. Similarly, data from some previous year could also be restored from one of

the yearly tapes. Table 2 shows this incremental backup schedule.

The numbers in Table 2 indicate the dump level used for that particular backup. All files that have

changed since the lower level backup at the end of the previous week are saved each day. For each

weekday level 9 backup, the previous Friday’s backup is the closest backup at a lower level.

Therefore, each weekday tape contains all the files changed since the end of the previous week (or

the since the initial level 0 if it is the first week). For each Friday backup, the nearest lower-level

backup is the previous Friday’s backup (or the initial level 0 if it is the first Friday of the month).

Therefore, each Friday's tape contains all the files changed during the week prior to that point. Please

note that the choice of dump levels is arbitrary. For example, dump levels of all 7 or all 8 could have

been used for the weekday backups. The choice of dump level relative to previous or subsequent

24
dump levels it what is important. A detailed explanation of backup scheduling is provided in Unix

Backup and Recovery (13).

Good Backup Practices

It is important to store data backups offsite, away from their source. Larry Ayoub, a senior executive

at Bank of America, said, “I think you have to accept that any data critical to the survival of a firm,

or which the loss of would result in considerable financial or legal exposure, must be sent offsite in

some manner, either physically or electronically” (2). Forty-five percent of companies leave their

server backup tapes onsite, vulnerable to natural calamities and security breaches, according to a

recent survey from Massachusetts-based business continuity company, AmeriVault Corporation (2).

Consider, for example, what would happen if data backups are stored in the back room of an office

space. If the whole building were destroyed by a file, all data would have been unrecoverable.

Care should be given to the choice of backup location as well. For example, what might have

happened if a company at the WTC had stored backups “offsite” by storing them on another floor in

the WTC building? Even though the collapse of the WTC was an unimaginable event, businesses

must prepare for the possibility of such events.

25
Finding a good way to store backups is almost as important as setting up a schedule to create them.

Backups should be stored in a place where only authorized people have access to them. A simple

solution is to create copies on disk drives or tapes daily and then move them to an offsite location

that is maintained by a data storage company. However, it can be difficult and expensive to move the

media offsite. The best solution for offsite data storage is to instantaneously transfer data over

network lines to a remote site. High security offsite backup services even mirror their data in offsite

locations.

Some additional considerations:

• Tapes should be labeled in a clear and consistent manner. In order to make restorations as

painless as possible, backups need to be easy to get to and well labeled. Labeling includes

clearly marking the tape itself as well as including a table of contents file so that individual

files on the tape can be found easily. In sites where several people share responsibility for

making backups or a number of different commands are used to create backups, the label

should also include the command used to create the backup. The label is also an ideal place

to keep a running tally of how many times the media has been used and how old it is.

26
• Backups must be tested regularly. Often, businesses have a good backup regimen with

automated backup software and reliable media, yet they seldom test restores of their data.

Backups can fail, and without testing the backups failure would not be detected until after a

crisis occurs.

• Design data for backups – keep filesystems to a size that is less than the backup media. This

will greatly simplify backups and thus reduce the risk of error.

Data Recovery

The reason that so much planning and diligence must be devoted to data backups is to facilitate data

recovery. Properly executed data backups will make the actual recovery of lost data the simplest task

of all. After determining which volumes contain the data that needs to be recovered, data is simply

27
recovered by using the native UNIX restore utility. The restore command is used to copy data from

the volume to a selected directory on a selected filesystem. Note that the Solaris version of restore is

actually ufsrestore. Details on the use of the restore command are provided in the UNIX System

Administration Handbook (10).

In some cases, a simple restore will not suffice. The data storage media may have physical damage.

There are only two major companies in the United States who specialize in recovery of data from

physical storage media. DriveSavers ([Link]) specializes in salvaging data damaged

by fire, floods or hard-disk crashes. This company also maintains a museum of bizarre disk disasters

on their website which is worth reading. Ontrack Data International ([Link]) also offers a

remote data recovery service for cases where the physical media is not destroyed.

Disaster Recovery Plan

For many companies, the most critical asset is their data. Implementing an effective data backup and

recovery system ensures the protection of this data in most circumstances. However, catastrophic

losses of entire systems (or worse, entire work sites) can and do happen. It is for this reason that

companies must prepare for the worst by developing a disaster recovery plan. Although data backups
28
and recovery are essential, they should not be thought of as disaster prevention, they should instead

be considered a critical component of the disaster recovery plan. When preparing the plan, some

considerations must be taken into account.

A risk assessment must first be conducted. The risk assessment will help to determine how much

data loss is acceptable. If it is not disastrous for a company to loose one day’s worth of data, then it

is not necessary to take data backups offsite every day. It is not desirable to spend too many

resources getting backups offsite unnecessarily. However, if the daily data is critical, plans must be

made for getting data offsite daily, or in some cases, more often.

Documentation of the disaster recovery plan must be created. In addition to outlining the steps for

recovering from a disaster, the documentation should provide contact information for software and

hardware vendors as well as primary and secondary personnel who are familiar with the disaster

recovery plan. Also, location of data backups should be identified. Because this document will guide

the reader through the recovery process, it is essential that this document, like the company’s data,

be backed up and stored safely offsite.

The final step to creating a disaster recovery plan is to test the plan. After the plan is in plan, it

should be tested with regular audits that are done by third party companies. For example, a

29
consultant could be hired – someone who is competent and knowledgeable but unfamiliar with the

system - to test the recovery system. This is necessary because those who are most familiar with the

plan may not be available to implement it after a disaster. It is important that other personnel be able

to understand and implement the plan.

The essence of data recovery

Data recovery means retrieving lost, deleted, unusable or inaccessible data that lost for various

reasons.

Data recovery not only restores lost files but also recovers corrupted data.

30
On the basis of different lost reason, we can adopt different data recovery methods. There are

software and hardware reasons that cause data loss, while we can recover data by software and

hardware ways.

Being different from prevention and backup, data recovery is the remedial measure. The best way to

insure the security of your data is prevention and backup regularly. To operate and use your data

according to the normative steps, you can reduce the danger of data loss to the lowest.

The scope of data recovery

There are so many forms and phenomenon on data problem, we can divide the objects or scope of

data recovery according to different symptoms.

System problem

31
The main symptom is that you cannot enter the system or the system is abnormal or computer closes

down. There are complex reasons for this, thus we need adopt different processing methods. Reasons

for this symptom may be the key file of system is lost or corrupted, there is some bad track on hard

disk, the hard disk is damaged, MBR or DBR is lost, or the CMOS setting is incorrect and so on.

Bad track of hard disk

There are logic and physical bad track. Logic bad track is mainly caused by incorrect operation, and

it can be restored by software. While physical bad track is caused by physical damage, which is real

damage, we can restore it by changing the partition or sector. When there is physical bad track,

you’d better backup your data for fear that the data can not be used any more because of the bad

track.

Partition problem

If partition can not be identified and accessed, or partition is identified as unformatted, partition

recovery tools such as Partition Table Doctor can be used to recover data.

Files loss

32
If files are lost because of deletion, format or Ghost clone error, files restoring tools such as Data

Recovery Wizard can be used to recover data.

Password Loss

If files, system password, database or account is lost, some special decryption tools that correspond

to certain data form such as Word, Winzip can be used.

Files repair

For some reasons, some files can not be accessed or used, or the contents are full of troubled

characters, the contents are changed so as they can not be read. In this condition, some special files

restoring tools can be tried to restore the files.

The principle of data recovery

Data recovery is a process of finding and recovering data, in which there may be some risk, for no

all situations can be anticipated or prearranged. It means maybe there will be some unexpected

things happen. So you need reduce the danger in data recovery to the lowest:

33
1. Backup all the data in your hard disk

2. Prevent the equipment from being damaged again

3. Don’t write anything to the device on which you want to recover data

4. Try to get detailed information on how the data lost and the losing process

5. Backup the data recovered in time.

Data Protecting Technologies

Data security and fault freedom of storage are paid more and more attention. People are attaching

more and more importance to developing new technologies to protect data.

34
[Link] Technology

SMART, also called Self-Monitoring Analysis and Report Technology, mainly protects HD from

losing data when there is some problems on the HD. SMART drive can reduce the risk of data loss,

it alarms to predict and remind thus enhancing the data security.

[Link]

Shake Protecting System, can prevent the head from shaking thus enhancing the anti-knock

characteristics of HD, avoiding damages caused by shake.

[Link]

DFT, a kind of IBM data protecting technology, can check hard disk via using DFT program to

access the DFT micro codes in hard disk. By DFT, users can conveniently check the HD operation.
35
[Link] disk array technology

Originally ‘Redundant Arrays of Inexpensive Disks’. A project at the computer science department

of the University of California at Berkeley, under the direction of Professor Katz, in conjunction

with Professor John Ousterhout and Professor David Patterson.

The project is reaching its culmination with the implementation of a prototype disk array file server

with a capacity of 40 GBytes and a sustained bandwidth of 80 MBytes/second. The server is being

interfaced to a 1 Gb/s local area network. A new initiative, which is part of the Sequoia 2000 Project,

seeks to construct a geographically distributed storage system spanning disk arrays and automated

libraries of optical disks and tapes. The project will extend the interleaved storage techniques so

successfully applied to disks to tertiary storage devices. A key element of the research will be to

develop techniques for managing latency in the I/O and network paths.

The original (‘Inexpensive’) term referred to the 3.5 and 5.25 inch disks used for the first RAID

system but no longer applies.

The following standard RAID specifications exist:

RAID 0 Non-redundant striped array

RAID 1 Mirrored arrays

36
RAID 2 Parallel array with ECC

RAID 3 Parallel array with parity

RAID 4 Striped array with parity

RAID 5 Striped array with rotating parity

The basic idea of RAID (Redundant Array of Independent Disks) is to combine multiple inexpensive

disk drives into an array of disk drives to obtain performance, capacity and reliability that exceeds

that of a single large drive. The array of drives appears to the host computer as a single logical drive.

The Mean Time Between Failure (MTBF) of the array is equal to the MTBF of an individual drive,

divided by the number of drives in the array. Because of this, the MTBF of a non-redundant array

(RAID 0) is too low for mission-critical systems. However, disk arrays can be made fault-tolerant by

redundantly storing information in various ways.

[Link]

SAN, called Storage Area Network or Network behind servers, is specialized, high speed network

attaching servers and storage devices. A SAN allows "any to any" connection across the network,

using interconnect elements such as routers, gateways, hubs and swithes. It eliminates the traditional

37
dedicated connection between a server and storage, and concept that the server effectively "owns and

manages" the storage devices. It also eliminates any restriction to amount of data that a server can

access, currently limited by the number of storage devices, which can be attached to the individual

server. Instead, a SAN introduces the flexibility of networking to enable one server or many

heterogeneous servers to share a common storage "utility", which may comprise many storage

devices, including disk, tape, and optical storage. And, the storage utility may be located far from the

servers which use it.

[Link]

NAS is Network Attached Storage. It can store the quick-increased information

.Backup means to prepare a spare copy of a file, file system, or other resource for use in the event of

failure or loss of the original. This essential precaution is neglected by most new computer users

until the first time they experience a disk crash or accidentally delete the only copy of the file they

have been working on for the last six months. Ideally the backup copies should be kept at a different

site or in a fire safe since, though your hardware may be insured against fire, the data on it is almost

certainly neither insured nor easily replaced.

38
[Link]

Backup in time may reduce the danger and disaster to the lowest, thus data security can be most

ensured. In different situations, there are different ways. Both backing up important data of system

with hardware and backing up key information with cloning mirror data to different storage device

can work well.

Main technical specification and parameter of hard disk

Capacity

39
We can see the capacity in two aspects: the total capacity and the capacity of one disk. The whole

capacity is made up of each disk capacity.

If we increase the disk capacity, we would not only improve the disk capacity, improve the speed of

transmission, but also cut the cost down.

Rotate speed.

Rotate speed is the speed disk rotate. It is measured by RPM (Round Per Minute).The rotate speed of

IDE hard disk are 5400RPM, 7200RPM etc.

Average Seek Time

The average seek time gives a good measure of the speed of the drive in a multi-user environment

where successive read/write request are largely uncorrelated.

Ten ms is common for a hard disk and 200 ms for an eight-speed CD-ROM.

Average Latency

The hard disk platters are spinning around at high speed, and the spin speed is not synchronized to

the process that moves the read/write heads to the correct cylinder on a random access on the hard

disk. Therefore, at the time that the heads arrive at the correct cylinder, the actual sector that is

needed may be anywhere. After the actuator assembly has completed its seek to the correct track, the
40
drive must wait for the correct sector to come around to where the read/write heads are located. This

time is called latency. Latency is directly related to the spindle speed of the drive and such is

influenced solely by the drive's spindle characteristics. This operation page discussing spindle speeds

also contains information relevant to latency.

Conceptually, latency is rather simple to understand; it is also easy to calculate. The faster the disk is

spinning, the quicker the correct sector will rotate under the heads, and the lower latency will be.

Sometimes the sector will be at just the right spot when the seek is completed, and the latency for

that access will be close to zero. Sometimes the needed sector will have just passed the head and in

this "worst case", a full rotation will be needed before the sector can be read. On average, latency

will be half the time it takes for a full rotation of the disk.

Average Access Time

Access time is the metric that represents the composite of all the other specifications reflecting

random performance positioning in the hard disk. As such, it is the best figure for assessing overall

positioning performance, and you'd expect it to be the specification most used by hard disk
41
manufacturers and enthusiasts alike. Depending on your level of cynicism then, you will either be

very surprised or not surprised much at all, to learn that it is rarely even discussed. Ironically, in the

world of CD-ROMs and other optical storage it is the figure that is universally used for comparing

positioning speed. I am really not sure why this discrepancy exists.

Perhaps the problem is that access time is really a derived figure, comprised of the other positioning

performance specifications. The most common definition is:

Access Time = Command Overhead Time + Seek Time + Settle Time + Latency

The speed with which data can be transmitted from one device to another. Data rates are often

measured in megabits (million bits) or megabytes (million bytes) per second. These are usually

abbreviated as Mbps and MBps, respectively.

Buffer Size（Cache）

A small fast memory holding recently accessed data, designed to speed up subsequent access to the

same data. Most often applied to processor-memory access but also used for a local copy of data

accessible over a network etc.

42
When data is read from, or written to, main memory a copy is also saved in the cache, along with the

associated main memory address. The cache monitors addresses of subsequent reads to see if the

required data is already in the cache. If it is (a cache hit) then it is returned immediately and the main

memory read is aborted (or not started). If the data is not cached (a cache miss) then it is fetched

from main memory and also saved in the cache.

The cache is built from faster memory chips than main memory so a cache hit takes much less time

to complete than a normal memory access. The cache may be located on the same integrated circuit

as the CPU, in order to further reduce the access time. In this case it is often known as primary cache

since there may be a larger, slower secondary cache outside the CPU chip.

The most important characteristic of a cache is its hit rate - the fraction of all memory accesses

which are satisfied from the cache. This in turn depends on the cache design but mostly on its size

relative to the main memory. The size is limited by the cost of fast memory chips.

Conclusion

Protection of a company’s critical data is not only essential to its continued daily operations, it is

also necessary for the survival of the company. Developing a strong data backup and recovery

system is therefore essential. However, a data backup and recovery system is not the only element

43
needed to protect a company’s data. Disaster could strike a company’s main systems at any time, and

they need to be prepared to deal with it. That is why a comprehensive disaster recovery plan should

be developed for any company that has data to protect.

References

1. Surette T., Genn V., “Special Report - Disaster Recovery & Systems Backup.” Australian

Banking & Finance, Feb 28, 2002 v11 i3 p15.

44
2. Murray, Louise. “Offsite storage: Sending your data away.” SC Online Magazine, September

2003, [Link]

3. Kovar, Joseph F. “Precarious position” CRN. Jericho, Sep 22, 2003, Iss. 1063; pg. 4A. Accessed

online through ABI/INFORM Global Database, [Link]

4. Mearian, Lucas. “Disk arrays gain in use for secondary storage: but tapes continue to handle most

data for backups and archiving, survey finds.” Computerworld, April 28, 2003 v37 i17 p12(1).

Accessed online through ABI/INFORM Global Database, [Link]

5. IBM Corporation, Hard Disk Drives (HDD) Web Page,

[Link] accessed November 3, 2003.

6. Indiana University, Storing Backups and Media Life Expectancy Web Page,

[Link] accessed November 3, 2003.

7. Castlewood Corporation, Home Page, [Link] accessed November 7, 2003.

8. Webopedia Online Dictionary for Computer and Internet Terms, [Link]

9. Quantum Corporation, DLTtape Home Page, [Link] copyright 2003.

45
10. E. Nemeth, G. Snyder, S. Seebass, T.R. Hein. UNIX System Administration Handbook, Prentice

Hall, 2001.

11. University of Maryland, AMANDA Web Page, [Link] updated July 30, 2003.

12. Eric Melski, B.U.R.T. Backup and Recovery Tool Web Page,

[Link] updated October 15, 1998.

Common questions