What is RAID?
RAID is an acronym for Redundant Array of Independent Disks (RAID).
By combining multiple hard drives into an array of hard drives, performance
is gained exceeding that of a single hard drive with the same total capacity.
This drive array appears to the host computer as a single logical storage unit
or drive. RAID was first proposed in 1987 at the University of California Berkeley,
in a paper entitled A Case for Redundant Arrays of Inexpensive Disks (RAID),
which described various types of disk arrays. Later, the meaning of the letter
I in RAID changed from inexpensive to independent.
Manufacturers of electronic devices specify Mean Time Between Failures (MTBF)
for their products. With an array of hard drives, the MTBF of the array becomes
the MTBF of an individual drive divided by the number of drives in the array.
For a RAID containing 5 drives that have an individual MTBF of 40,000 hours,
the MTBF would be 9,000 hours. So, if an individual drive is expected to work
continuously for five years before a failure occurs, an array of five drives
is only expected to work for one year.
The Berkeley paper defined five types of array architectures - RAID-1 through
RAID-5 - to address the issue of shorter MTBFs. Disk arrays are configured for
fault-tolerance by redundantly storing information in various ways. One array
architecture later added to the RAID standard is a non-redundant array of disk
drives called RAID-0, which uses only Data Striping.
Data Striping
The core technology in RAID is Data Striping, which concatenates multiple
drives into a single storage unit by partitioning each drive into stripes. These
stripes are interleaved across the drives and may be as small as one sector
(512 bytes) or as large as several megabytes. The optimum size for the stripes
depends on the applications that are stored on the RAID.
Multi-user operating systems - such as Windows NT 4, Windows 2000, Unix and
Netware - support concurrent disk operations across multiple drives. Load balancing
for the RAID maximizes throughput. JBOD, an acronym for Just a Bunch of Disks,
is a multiple drive system without striping, and without load balancing. With
JBOD, some drives contain data files that are frequently accessed and some drives
that are rarely accessed.
For I/O intensive or multi-user database accesses, striping the drives in the
array with stripes larger than a single record, so that a record falls entirely
within one or two stripes optimizes performance. This ensures that the data
is contained on a single sector in each of one or two drives. All of the drives
are still available for more record operations, except for the one or two sectors
that are locked by the record. This maximizes the number of simultaneous data
operations that can be performed by the array.
For data intensive environments or single-user systems that access large records,
small (512-byte) stripes cause each record to span across all the drives in
the array. With each drive storing a portion of the data from the record, accesses
perform faster, because the data transfer interleaves onto the multiple drives.
However, small stripes rule out multiple overlapped data operations, because
each access will typically involve all drives. Applications that utilize long
record accesses, such as on-demand video, document management, or data acquisition,
work best with small stripe arrays.
Small stripes require synchronized spindle drives to prevent degraded performance
when accessing short records. Without synchronized spindles, each drive in the
array may be at a different rotational position from when their data was written.
Completing a disk access requires waiting until every drive has accessed its
part of the record, which can take another rotation of the disk platter on one
or more drives. The more drives in the array, the worse the average access time
for the array. Synchronized spindles assure that every drive in the array reaches
its data during the same rotation of their respective platters. The access time
of the array becomes equal to the average access time of a single drive rather
than approaching the product of access time and the number of drives in the
array.
RAID levels
RAID-0
RAID Level 0 is a misnomer because the storage is not redundant, but it is
an array. In RAID-0, data is interleaved across drives for higher data throughput.
Because it stores no redundant information, its performance is very good, however
the failure of any drive in the array results in data loss. RAID-0 is also known
as striping. RAID-0 is the fastest and most efficient array type but offers
no fault-tolerance.
RAID-1
RAID Level 1 provides redundancy by writing all data to two or more drives.
RAID-1 tends to perform faster on reads and slower on writes compared to a single
drive. There is no loss of data if either drive fails. This is a good entry-level
redundant system, since only two drives are required. Because one drive is used
to store a duplicate of the data, the cost per megabyte is high. RAID-1 is also
known as mirroring. RAID-1 is a good choice for performance-critical, fault-tolerant
environments. In addition, RAID-1 is the only choice for fault-tolerance if
no more than two drives are desired.
RAID-2
RAID Level 2 is intended for use with drives that do not have built-in error
correction by implementing error correction on the RAID controller. Since all
SCSI drives support built-in error detection, RAID-2 adds unnecessary overhead
when using SCSI drives. This is why we do not attempt to support RAID-2 in any
of our storage products.
RAID-3
RAID Level 3 stripes byte level data across several drives, with parity stored
on one drive. The parity information allows recovery from the failure of any
single drive. Like RAID-0, the read performance of RAID-3 is very good for reads.
Performance for small random writes suffers a little because parity data must
be updated each time. Large writes or sequential writes are fairly fast. Because
only one drive in the array stores redundant data, the cost per megabyte of
RAID-3 can be fairly low. Byte-level striping requires hardware support for
efficient use. RAID-3 is a good choice for data intensive or single-user environments
that access long sequential records to speed up data transfer. However, RAID-3
does not allow multiple I/O operations to be overlapped and requires synchronized-spindle
drives in order to avoid performance degradation with short records.
RAID-4
RAID Level 4 stripes byte level data across several drives, with parity stored
on one drive. The parity information allows recovery from the failure of any
single drive. Like RAID-0, the read performance of RAID-4 is very good for reads.
Performance for small random writes suffers a little because parity data must
be updated each time. Large writes or sequential writes are fairly fast. Because
only one drive in the array stores redundant data, the cost per megabyte of
RAID-4 can be fairly low. RAID-4 offers no advantages over RAID-5 and does not
support multiple simultaneous write operations.
RAID-5
RAID Level 5 is similar to RAID-4, but stores parity across the drives. Small
writes in multiprocessing systems are faster because the parity disk is not
a bottleneck. On reads, parity data must be skipped on each drive, so the read
performance tends to be lower than a level 4 array. The cost per megabyte is
the same as for RAID-4.
RAID Implementations
Hardware RAID
The hardware-based system manages the RAID subsystem independent of the host
computer and each RAID array appears logically, and transparently, as a single
disk. Hardware RAID is true hardware multi-tasking because the host computer
can execute user applications while the array adapter concurrently executes
the array functions. Hardware RAID uses no system resources from the host computer,
and function independent of operating system. Hardware RAID is also highly fault
tolerant. Our RAID controllers are hardware RAID solutions.
Software RAID
Because the software-based array contends with other applications for host
computer resources - such as memory and CPU cycles - and because it is operating
system dependent, Software RAID degrades overall server performance. So, the
performance of a software-based array is directly dependent on the host computer's
CPU performance and load. Software RAID will fail to boot if the boot drive
in the array fails, making it a poor choice for fault tolerance. The RAID support
available within Windows NT and 2000 are examples of Software RAIDs.
|