Jim Handy at Objective Analysis has just published a White Paper about SSDs titled “Enterprise Reality, Solid State Speed.” The White Paper provides an excellent introduction to issues surrounding SSD design. The key issue here is the basic wearout failure mechanism inherent in NAND Flash memory. Handy writes:
“NAND Flash is a messy medium. One means by which chip architects pushed NAND Flash costs below those of NOR Flash (or any other memory technology for that matter) was by compromising data integrity.”
But don’t let that scare you. Handy continues:
“In a move borrowed from the HDD industry, NAND Flash stores data in a way that anticipates data corruption, then requires an external controller to scrub the data every time it is read fro the device.”
Sound messy? It is, but no messier than the issues HDD designers have had to deal with for many years. This same sort of problem has been the driver behind the development of increasingly powerful error checking and correction (ECC) specifically developed for the unique failure mechanisms of the HDD. As Handy writes:
“Fortunately, error correction coding (ECC) is well understood, and ECC is keeping pace with the degradation of NAND data integrity, offsetting increases in Flash error rates.”
If you think that this sort of thing cannot go on forever, you’re right. Handy continues:
“However, another difficulty adds to this trouble. NAND Flash has a wear-out mechanism that is unique to this technology. After a large number of erase/write cycles, bits start to lock up and can no longer be used. This adds to the number of bits that the ECC must correct. As an increasing number of bits become unusable, errors rise to approach the limits of the ECC engine’s capabilities. At this point, that particular block must be removed from the pool of available memory.”
Total block failure then leads to the next level of error protection that must be built into the SSD: reserve blocks, also known as overprovisioning.
ECC must occur quickly and is usually implemented in the hardware of the NAND Flash controller. Because the required amount of ECC changes with each generation of NAND Flash device, you generally want to look for a controller with a flexible ECC capability.
Overprovisioning is a strategy that varies by SSD design, so it’s most often handled in the firmware that drives the NAND Flash controller. A very recent example of this is the release by LSI Corp late last month of new firmware for SandForce SSD controllers that boosted drive capacity roughly 7% without changing the required amount of raw NAND Flash memory. (See “LSI Releases Code To Manufacturers – New Increased Capacity ‘SandForce Driven’ SSDs Hit The Streets”.) The added capacity came from a new overprovisioning strategy implemented in the firmware for the SandForce SF-2000 series of SSD controller chips. So, when picking a controller, you also want to know that the controller vendor is on the ball with up-to-date firmware that is frequently updated to extract the most performance and capacity from the current generation of NAND Flash memory. You also want to be sure that you understand the performance, reliability, and endurance goals that drove the development of that firmware.
For a copy of the Objective Analysis SSD White Paper, click here.
For information on the Cadence NAND Flash memory controller, click here.