Problem Background
Electrical or magnetic interference inside a computer system can cause a single bit of dynamic random-access memory (DRAM) to spontaneously flip to the opposite state. It was initially thought that this was mainly due to alpha particles emitted by contaminants in chip packaging material, but research has shown that the majority of one-off ("soft") errors in DRAM chips occur as a result of background radiation, chiefly neutrons from cosmic ray secondaries, which may change the contents of one or more memory cells or interfere with the circuitry used to read/write them.
There was some concern that as DRAM density increases further, and thus the components on chips get smaller, while at the same time operating voltages continue to fall, DRAM chips will be affected by such radiation more frequently—since lower-energy particles will be able to change a memory cell's state. On the other hand, smaller cells make smaller targets, and moves to technologies such as SOI may make individual cells less susceptible and so counteract, or even reverse, this trend. Recent studies show that single event upsets due to cosmic radiation have been dropping dramatically with process geometry and previous concerns over increasing bit cell error rates are unfounded.
The Cassini–Huygens spacecraft, launched in 1997, contains two identical flight recorders, each of which contains 2.5 gigabits of memory in the form of arrays of commercial DRAM chips. Its engineering telemetry sends the number of (correctable) single-bit-per-word errors and (uncorrectable) double-bit-per-word errors. In the vicinity of Earth, and when the sun is "quiet", it reported a nearly constant single-bit error rate of about 280 errors per day. The maximum hourly error report from Cassini–Huygens in the first month in space was 3072 single-bit errors per day during a weak solar flare. If the flight recorders had been designed with EDAC words assembled from widely-separated bits, the number of (uncorrectable) multiple-bit errors should average less than one per year.
Work published between 2007 and 2009 showed widely varying error rates with over 7 orders of magnitude difference, ranging from 10−10–10−17 error/bit·h, roughly one bit error, per hour, per gigabyte of memory to one bit error, per millennium, per gigabyte of memory. A very large-scale study based on Google's very large number of servers was presented at the SIGMETRICS/Performance’09 conference. The actual error rate found was several orders of magnitude higher than previous small-scale or laboratory studies, with 25,000 to 70,000 errors per billion device hours per megabit (about 3–10 × 10−9 error/bit·h), and more than 8% of DIMM memory modules affected by errors per year.
Read more about this topic: ECC Memory
Famous quotes containing the words problem and/or background:
“It is part of the educators responsibility to see equally to two things: First, that the problem grows out of the conditions of the experience being had in the present, and that it is within the range of the capacity of students; and, secondly, that it is such that it arouses in the learner an active quest for information and for production of new ideas. The new facts and new ideas thus obtained become the ground for further experiences in which new problems are presented.”
—John Dewey (18591952)
“Pilate with his question What is truth? is gladly trotted out these days as an advocate of Christ, so as to arouse the suspicion that everything known and knowable is an illusion and to erect the cross upon that gruesome background of the impossibility of knowledge.”
—Friedrich Nietzsche (18441900)