('Flash' refers to erasure being done on a relatively large number of cells in parallel, an analogy being drawn to a camera flash by the inventors, probably inspired by the common EPROMs of the day that could be bulk-erased by lengthy exposure to ultraviolet light through a quartz window in the package. [Ultraviolet bombardment slightly increased leakage in the insulators, causing the stored charge (data) to be lost over the exposure time.])Flash devices are EEPROMs that are arranged into erase blocks, the block size being chosen as a compromise. Larger blocks are more space-efficient, but too large a block means you are taking a significant percentage of your total storage out of service during erasure, complicating your system. Also, depending on how you are using the blocks, you may need to waste a substantial portion of a block because you might not be able to pack it full. Too small a block wastes silicon space, space that could be used for more storage at a given price point, and can slow things down since usually only one block can be erased at a time. Typical block sizes are 64KB or 128KB.
Most devices erase blocks to an all 1's condition, and then individual bits may be dragged back to 0 as required. You can often re-program a cell any time you like, so long as you're only adding 0's, but to change anything to a 1 requires a bulk erasure. Some implementations of flash technology do not allow any re-programming of a once-programmed cell, not until it is first erased. (This is especially true of MLC/TLC devices, where stored cell charge is not a binary value, and this lack of update-ability dramatically complicates making a reliable flash-aware filesystem that can support these devices.)
All devices called "flash", over the nearly 40 years this technology has existed, are based upon variations on this same tunneling theme, regardless of whatever other technological and marketing terms are being applied. And therein lies the rub.
Flash, how do I hate thee? Let me count the ways:
Most of these flash problems basically go away, if you Just. Stop. Writing to them all the time!
WearThe electron tunneling is hard on the silicon! It... leaves a mark.
Every write or erase operation erodes the permeability of a storage cell's insulating wall, fractionally. This 'wear' shows up as an exponential reluctance for charge to pump through the wall of a worn cell. (Think of it as a filter that clogs with use.) Eventually pumping just stops altogether, and the cell's charge is, essentially, fixed from then on, with no guarantee that the fixed charge is even at a value that will guarantee a stable reading. (If the charge level is stuck near a switching threshold, electrical noise or minor variations in supply voltage or temperature may affect the value you see, and there's no way to know that this is the case. A worn-out cell, left in service, is bad. CRC's or other block data integrity mechanisms may be of use here.)
SpeedAs the permeability of the insulating cell wall degrades, the charge pumping goes slower and slower. All devices (except the earliest small [2KB, etc.] ones) are adaptive to the charge current, in order to maximize speed. The difference in write/erase time from a fresh device to a worn-out one is more than 10:1 in some older NOR devices I am intimately familiar with. This slowdown is basic physics, though the ratio may be different in any particular device type.
So, even though the product might be keeping up with flash writing when it's first delivered, it may be that the write slowdown as the system ages causes malfunctions, long before the expected service life of the device is exhausted. With a 10:1 ratio, for example, it would be easy to see how this could happen. Even a 2:1 ratio, which seems implausibly good to me, could easily cause problems if you weren't careful.
CyclesEach write or erase operation is half of a full cycle on any given memory cell. (It takes two state changes on a cell to comprise a cycle. If the state doesn't change, electrons were not pumped through the barrier, and so there is no wear. Some devices require full cycles on all cells, and so an erase might be preceded by an erase-to-zero step for all remaining '1' bits.)
Commercial devices have been fielded that guarantee as few as 100 erase/write cycles per block, though that is unusually fragile and rarely talked about. ("Did we mention how inexpensive our camera SD card is? And how shiny?") Commonly available devices are guaranteed to more like 10,000 or 100,000 cycles—some few vendors claim 1,000,000 cycles. (One vendor even claims that an on-chip thermal annealing process can be used to 'heal' worn-out blocks, allowing for 100,000,000 cycles, but so far there are no available devices that use this technology.)
The newest devices, MLC (four-level) and TLC (eight-level) high-density NAND devices, are on the low end of the scale, with 1,000 cycles being not uncommon.
Bad BlocksIf a block does wear out, it may be that you can substitute a different block for it, one that has not yet worn out. Worn-out blocks need to be taken out of rotation permanently, because once they have worn out their contents are unpredictable, and unreliable.
At one time this bad block handling had to be provided by system software, but most newer/larger devices incorporate this functionality into themselves, and have additional spare blocks in reserve. Most of these devices handle bad blocks by mapping in spare blocks so that manufacturing yields can be higher, as this is more aimed at mapping out stillborn and infant-mortality blocks than mapping out blocks that have worn out with long use. (Though they certainly handle that too, the supply of spares is modest and will be exhausted rapidly at true end-of-life of the device.)
Automatic bad-block management is often touted as the solution to flash wear-out problems, but it really isn't. A weak analogy may help: Let's say your wallet had automatic bad-bill management. Any bill that became worn or tattered enough to no longer serve would be automatically retired (destroyed) by your wallet. Your wallet might have a small reserve of replacement bills, but this would be quickly exhausted if the general supply of money was wearing out. Any worn bills put in there after that would simply disappear. Not Good!
Now consider what happens if wear-leveling is truly working correctly. The bills are all wearing out at about the same time. The supply of replacements will be, if you'll forgive the expression, gone in a flash. Even worse (the 1000-cycle flash technology case), what if the money is all printed on tissue paper? Your wallet will not be a reliable store of money for long.
Wear LevelingIn order that a device not be destroyed by wear too soon, most practical systems must provide some way to spread the wear around the device so that no one block, even if 'picked on', is likely to wear out any sooner than any other.
The quality of this leveling system is, though, highly variable. One older software product I am familiar with exhibited a more than 2:1 spread on write cycle counts, over time. (This particular product could not handle bad blocks, so the effective lifetime of the product, when governed by flash wear-out, was only about half what the device was rated. On the other hand, it may be that this algorithm was aimed at mitigating the Overfilling problem mentioned next.)
At one time this leveling had to be provided by system software, but newer/larger devices incorporate this functionality into themselves. There's a lot of variation, though. Some devices wear-level everything, and others only level the wear on blocks that aren't holding static content. Which strategy works better at maximizing device life? It depends! Do you even know the strategy used by your device?
OverfillingIt is somewhat counterintuitive that as a flash device is used its speed degrades, for reasons other than wear mechanisms. This is due to three different effects:
So, some provisional rules for speed-testing your ability to write data to the file system, if you want an accurate prediction of behavior in the field over a long service life:
- The first fill of the device's space runs at the raw write speed of the device. No erasing is required, because everything was erased to begin with. After the device has been filled once, blocks now need to be erased before they can be re-used. This rolls erase operations into the mix, and erases are generally a lot slower than writes. A virgin device is thus notably faster than one that's been around the block a few times, and your design qualification testing must usually take this into account.
- Full wear leveling means that even stable, long-lived data sometimes has to be displaced when there is fresh data to be recorded. Nobody is immune to displacement! The problem here is that if you displace an erase block that is already full of persistent data, attempting to equalize the wear, you haven't actually reduced your write liability! You moved a block out of the way to make room for your fresh data while equalizing the wear, and recorded the fresh data in the vacated space, but now you have to find another block into which to put the just-displaced data...ad infinitum? (This is called Write Amplification in the literature.) The more persistent data fills the device the more likely the next-chosen block is to itself need preservation. The worst-case penalty for this is, just like margin calls when playing the stock market, essentially open-ended. This is an unavoidable trait, if wear-leveling is in use.
A simplistic wear-leveler that leaves stable blocks alone will avoid this behavior, but it will also exhibit a large standard deviation in the block erase statistics, and the more full with static content a device is the larger this deviation will be. If the deviation gets too large then there are going to be some blocks worn out well before others, which will result in reduced device life, as the supply of spare blocks is finite.
A wear-leveler that attempts to leave stable blocks alone for awhile will tend to avoid the worst-case behavior, but it must eventually put these stable blocks into play or it is not leveling wear! If a wear-leveler is smarter like this in order to reduce the abrasive block churn there's going to be a greater standard deviation in the block erase statistics. If the deviation gets too large then there are going to be some blocks worn out well before others, which may also result in reduced device life. TANSTAAFL!
- Erase blocks are generally much larger than the block size that filesystems use, so there is a mapping and packing stage in the filesystem to ensure that the advertised device capacity is reachable in practice. As the persistent data accumulates in the file system, whenever the filesystem has to forage for fresh erase blocks to use it will find a larger and larger percentage of these already have data in them that must be preserved. This means that the foraging yield goes down, and it must harvest even more blocks in order to find enough reclaimable space for its needs.
In the degenerate case, with the filesystem almost entirely full of data and a particularly unfortunate mapping of filesystem data to erase blocks, the system could in theory end up rolling the entire flash space for each application-level filesystem change. This would not only be hideously slow, but it would end the life of the flash in a hurry due to wear if the condition persisted for long.
- Do not do a hasty speed test on a new device and call it good! Fill the flash first with junk, then clean it off. Only then, once all erase blocks have been used at least once, should you start speed-testing.
- Do not do a speed test on a device unless the amount of data persistent in the file system is similar to your expected worst-case scenario. You need to stress-test the problematic wear leveling and block mapping layers at realistic levels.
- Scale any numbers you get by the (likely unpublished) slowdown ratio of your flash technology over its rated life.
Read DisturbAs if it were not bad enough that writing can wear out a cell, some flash technologies don't have an entirely non-destructive read process! Though more erosive than destructive, reading in some cell geometries can cause gradual leaking away of stored charge, requiring periodic refresh of otherwise read-only blocks in order to prevent eventual data loss. (There is a great similarity here to common dynamic RAM, which also stores data as charges on capacitors. In DRAM these charges leak away in milliseconds, and the reads are destructive, but otherwise they are alike in this way.) Oh, and it is not necessarily the block you read that gets disturbed, sometimes it's a neighbor. That's extra fun.
On a device with a substantial write load, the wear leveling process, if it's any good, probably takes care of this problem for you.
On systems where an in-RAM file buffering mechanism is mapped over the flash, and which has sufficient RAM for the OS buffering pool, reading a 'file' repeatedly may result in only reading the flash itself once, subsequent reads being satisfied by the file cache. This would reduce the erosion, probably to a negligible level.
Power Failure CorruptionIf the power fails during a flash erase or write operation, charge will not be fully pumped through the storage cell walls. This could result in an entire erase block being corrupted, just as in the worn-out cell case, without any indication of this once the device is re-powered! A corrupted filesystem could be the result, and there may be no recovery other than a complete reformatting of the device.
A variation on this theme is when an erased block is being programmed with contents and the power fails. If the block-filling loop stops in the middle, the block's contents, even if all cells do have reliable 1's and 0's in them, may be inconsistent and thus corrupt.
Naturally CRC's or other data integrity mechanisms can be used here to detect this, but recovering from this is a whole different story. Losing file data is bad enough, but losing filesystem structural data is usually much worse.
Flash-aware filesystems must be extremely carefully designed so that these unavoidable conditions are cleanly recoverable.
RetentionRecall that data is held as the state of stored charge in an insulated cell. No insulator is perfect, and stored charges leak away all on their own, even if you're not using the device, and even if there's no window in the package. Higher temperatures cause faster leakage, and the effect is non-linear.
Rated retention can be as low as 1 year! (One eMMC part that I know of is rated at 165 days at maximum rated temperature, which is clearly to be avoided.) Some vendors claim 20 years. Periodic refreshing can extend this, of course. The wear leveling system, as it operates, will be refreshing blocks for you, so that may be enough to keep you going.
X-RaysX-ray radiation, like ultraviolet radiation, also enhances stored charge leakage. Are your devices mobile, do they tend to take many commercial flights, or go near a dentist or doctor's office?
OpacityWith the trend towards embedding erase/write control, wear leveling, and bad-block handling entirely into the flash device, it is becoming impossible to get any feedback on the effectiveness of the wear leveling algorithm, the incidence of bad blocks, or early warning on how wear is progressing based on erase slowdown. This opacity is not good if preventative maintenance is an intended feature of the product. The devices just work perfectly, all on their own.
Until they don't. Surprise!
Return to Site Home