Linux Data Integrity Testing

Just a list of some things Testing suites with integrity testing

http://www.iozone.org/

http://www.inquisitor.ru/about/

http://www.stresslinux.org/sl/

Memory Testing
For broken memory use Memtest 86+.

Corruption
Random CRC errors? I spent a week working out the fault in this pc. All tests, stress testing (stresslinux, full destructive run on Inquisitor, prime 95, memtest 86+, everything passed. The only thing I could reliably show a problem was doing this. for i in $(seq 1 10); do wget -qO- http://xxx.xxx.xx.xx/cyclone/img/dq35.gz | pigz -d - | ntfsclone -rO /dev/sda1 - ; done

Link : Gigabit (lighttpd server on server end), image ~3.6Gb, PC Intel DQ35JOE with 3Gb ( 2x 512Mb and 2x 1Gb modules )

Error: Either a fault coming from ntfsclone "ERROR: Invalid command code in image", or "ERROR:restore_image: corrupt image" [manifests between about every other run and every 8th run]. or it it gets all the way through, a CRC error from pigz (it checks crc of the gzip image) at end of entire download. I assume a fair amount of corruption triggers ntfsclone to see a problem mid-flow so to speak, whereas perhaps a single bit error will trip up pigz CRC check at the end.

Note,
 * Using wget and piping to pigz doing integrity testing (pigz -t) alone didn't fail.
 * Copying the image to the hdd on another partition and cat'ing and piping to pigz and then piping to ntfsclone didn't show up a problem either.
 * This only manifests chaining all three together which presumably loads the 'system' more?
 * Another Intel DQ35JOE model with exactly same spec doing exactly same thing was fine doing above test!

Tried;
 * Fine running 2x 512Mb
 * Fine running either 1x 1Gb alone
 * Failed with all modules in
 * Failed with 2x 1Gb.

Also Tried;
 * Same problem changing 2x 1Gb blocks for other 2x 1Gb blocks (all Kingston)
 * Same problem changing hdd and PSU
 * Same problem swapping channels around (1Gb in both blue to both black)

Therefor;
 * ok dual channel single sided
 * ok single channel double sided
 * fail dual channel double sided.

Whats the difference?? well the 1Gb blocks are double sided so one side is using a different connection to the memory controller. Dual channel needs tight matching of the way each DIMM runs (sorry I dont know the technicalities about this.)

I guess this obscure fault is therefor triggered by running double sided memory in dual channel mode and a fault is present in the memory controller. So getting the board RMA'd now.....

Incidently I had a similar issue before (was RAM problem this time) and looping through extracting/integrity testing was only way to bring it up.


 * Conclusion - Board replaced with identical, just passed 30 loops no problem :)