You Don't Know Jack about Disks
|
Reliability and Performance
More abstract interfaces that isolate the host from drive error recovery and flaw
management contribute to a more reliable storage system. One way drives take advantage
of the intelligent interface is with sophisticated error recovery. Because error
correction codes and recovery techniques are now handled inside the drive, the
drive has abilities that no host has. For example, if a particular sector is difficult
to read, a drive can make a complex series of adjustments as it repeatedly attempts
to recover the information on the disk. It can move the head slightly to the left
and then to the right or it can adjust the timing, essentially starting the reading
a fraction of a millisecond sooner or later. A drive might do 20 to 30 retries
in various combinations of offset and timing to give itself the best opportunity
to pick up the data. This procedure does not happen often, but it can make the
difference between data that can be read and data that cannot. Once it recovers
the information, it can then, optionally, mark that section of the drive as bad
and rewrite those logical blocks to another section. A certain percentage of the
raw capacity of the drive is reserved for remapping of sections that go bad later
on. This space is neither addressable nor visible through the interface.
Intelligent interfaces brought an immediate performance benefit in that they could
buffer data until the host was ready to accept it. Similarly, they could accept
data regardless of when the drive was positioned to write it, eliminating the
need for the host to synchronize its connection to the drive with positioning
of the heads for a data transfer. Although this buffering proved to be an important
performance improvement, drives go well beyond this in enhancing performance.
In a demanding workload environment, a high-performance drive can accept a queue
of commands. Based on the knowledge of these specific commands, the drive can
optimize how they are executed and minimize the time required to complete them.
This assumes that multiple commands can be issued and their results either buffered
in the drive or returned out of order from the initial request. The longer the
command queue is, the greater the possibility for throughput optimization. Of
course, host file systems or controllers do this as well, and many argue that
it can manage more commands. The drive offers some special advantages, however.
Consider a drive attached to a single host. If the host is managing the I/O queue,
it can be confident that the read/write head is at roughly the location of the
last I/O it issued. It will usually select another command as close as possible
to the previous one to send to the drive, often working from one end of the logical
block address (LBA) range to the other and back again. The LBA range is that sequential
"tape" model mentioned earlier. This is about as good an LBA scheduling
model as the host allows.
The drive, on the other hand, knows the actual geometry of the data. This
includes exact information about the radial track position and the angular or
rotational position of the data. If it has a queue to work on, it selects the
next operation based on the one nearest in time, which can be quite a bit different
from the nearest LBA. Consider the following work queue:
Operation |
Starting LBA |
Length |
Read |
724 |
8 |
Read |
100 |
16 |
Read |
9987 |
1 |
Read |
26 |
128 |
The host might reorder this to:
Operation |
Starting LBA |
Length |
Read |
26 |
128 |
Read |
100 |
16 |
Read |
724 |
8 |
Read |
9987 |
1 |
This seems to make sense. The actual rotational location of the data, however,
may make this the worst ordering. The drive can take advantage of both seek
distance and rotational distance to produce the optimal ordering shown in Figure
7.
Figure 7

The drive-ordered queue would complete in three-quarters of a revolution, while
the host-ordered queue would take three revolutions. The improvement when the
drive orders a queue can be impressive.
As Figure 8 shows, a drive that is able to do a throughput of about 170 random
I/Os per second with no queue can achieve more than three times that number
with a sufficiently long queue (admittedly, a queue of 256 I/Os would be rather
large). Note also that as the queue lengthens, the variation in response time
can go up. The average service time for an individual operation will decrease,
but some commands will sit in the queue longer, victims of the overall optimization.
This results in both an increase in average response time and—what is
usually of more concern to users—an increase in the variation in response
time. The interface provides a timer to let the user limit the maximum time
any I/O can sit before it must be serviced.
Figure 8

If the drive is servicing work from more than one system, the benefits from the
drive managing the queue can be especially valuable. Two hosts may each operate
as though it is the only source of I/O requests. Even if both do the best they
can in ordering their queues, the conflict of the two independent queues could
at times produce interference patterns that result in degraded performance. The
drive can efficiently coalesce those workloads and reap the benefits of a single,
longer queue. In enterprise-class disk arrays, where four or eight storage directors
might be accessing each drive, the benefit is even clearer.
|