Linux Data Integrity Testing

Just a list of some things Testing suites with integrity testing

Memory Testing

For broken memory use Memtest 86+.

Corruption

Random CRC errors? I spent a week working out the fault in this pc. All tests, stress testing (stresslinux, full destructive run on Inquisitor, prime 95, memtest 86+ , everything passed. The only thing I could reliably show a problem was doing this.

for i in $(seq 1 10); do wget -qO- http://xxx.xxx.xx.xx/cyclone/img/dq35.gz | pigz -d - | ntfsclone -rO /dev/sda1 - ; done

Link : Gigabit (lighttpd server on server end), image ~3.6Gb, PC Intel DQ35JOE with 3Gb ( 2x 512Mb and 2x 1Gb modules )

Error: Either a fault coming from ntfsclone "ERROR: Invalid command code in image" , or "ERROR:restore_image: corrupt image" [manifests between about every other run and every 8th run]. or it it gets all the way through, a CRC error from pigz (it checks crc of the gzip image) at end of entire download. I assume a fair amount of corruption triggers ntfsclone to see a problem mid-flow so to speak, whereas perhaps a single bit error will trip up pigz CRC check at the end.

Note,

Using wget and piping to pigz doing integrity testing (pigz -t) alone didn't fail.
Copying the image to the hdd on another partition and cat'ing and piping to pigz and then piping to ntfsclone didn't show up a problem either.
This only manifests chaining all three together which presumably loads the 'system' more?
Another Intel DQ35JOE model with exactly same spec doing exactly same thing was fine doing above test!

Tried;

Fine running 2x 512Mb
Fine running either 1x 1Gb alone
Failed with all modules in
Failed with 2x 1Gb.

Also Tried;

Same problem changing 2x 1Gb blocks for other 2x 1Gb blocks (all Kingston)
Same problem changing hdd and PSU
Same problem swapping channels around (1Gb in both blue to both black)

Therefor;

ok dual channel single sided
ok single channel double sided
fail dual channel double sided.

Whats the difference?? well the 1Gb blocks are double sided so one side is using a different connection to the memory controller. Dual channel needs tight matching of the way each DIMM runs (sorry I dont know the technicalities about this.)

I guess this obscure fault is therefor triggered by running double sided memory in dual channel mode and a fault is present in the memory controller. So getting the board RMA'd now.....

Incidently I had a similar issue before (was RAM problem this time) and looping through extracting/integrity testing was only way to bring it up.

Conclusion - Board replaced with identical, just passed 30 loops no problem :)

Linux Data Integrity Testing

Memory Testing

Corruption

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools