= List of Node Failures = || ''' Node ''' || ''' Failure Mode ''' || ''' Solution / Notes ''' || || [1,5] ||Pxe Halt - Locks up during execution of PXE code ||Multiple resets (more than 1) [[BR]] may be required [[BR]] Might require node Change || || [1,5] ||Dead Node ID box top LED (the blinking one) || Power cycle Fixed it [[BR]] Rabbit Issue? || || [3,8] ||First Power on Halt || Locks during the first attempt [[BR]] Post after reset || || [17,4] ||First Power on Halt || Locks during the first attempt [[BR]] no serial console output || || [1,14] ||First Power on Halt || Locks during the first attempt [[BR]] Reset Fixes it [[BR]] has new disk [[BR]] || || [20,19] ||Disk Failure || Kernel Throws errors during imageing [[BR]] Disk Changed || || [12,9] ||Disk Controller Failure || Disk controller was having issues, disks were being incorrectly recognised|| || [3,18] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [5,11] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [14,11] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [13,5] ||Lock Up || Rabbit and Node were halted [[BR]] Power cycled|| || [4,11] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [5,9] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [9,11] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [3,19] ||Bad Node || Mother board Failure, refused to boot [[BR]] Replaced || || [14,8] ||Disk Failure || Kernel Throws Disk Errors [[BR]] Disk Changed|| || [17,9] ||Disk Failure || Disk write halts, imaging times out[[BR]] Disk replaced || || [18,3] ||Over heat || CM measures internal temp at 106F, fails to boot reliably || || [20,2] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [8,13] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [9,10] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [5,2] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [17,13] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [12,1] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [6,14] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [17,19] || Memory Failure || Memory Pins did not make proper contact, Bent case and reinserted memory || || [7,2] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [5,15] ||Lock Up || Rabbit and Node were halted, node ID box LED was solid [[BR]] Power cycled|| || [7,2] ||Lock Up || Rabbit and Node were halted, node ID box led was off [[BR]] Power cycled|| || [16,1] ||Lock Up || Rabbit and Node were halted [[BR]] Power cycled|| || [1,9] ||Intermitten failure || Power cycled|| || [1,5] ||Disk Failure || Failing disk caused disk controller to fail[[BR]] Cm had issues also, both replaced|| || [9,4] ||Disk Failure || Failing disk caused disk controller to fail[[BR]] Cm had issues also, both replaced|| || [15,6] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [18,16] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [3,11] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [16,19] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [5,17] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [20,4] || Node Failure|| Node was replaced || || [15,4] || Node Failure|| Node was replaced, bad left antenna connector. Replacement was used || || [5,14] || Overheat|| Fan was not plugged in || || [17,4] || Disk Failure || Smartctl reports impending disk death|| || [9,9] || Memory Failure || Memory Pins did not make proper contact, Bent case and reinserted memory || || [11,4] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [12,7] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [13,2] ||Disk Failure || Successfully booted from disk, but kernel was throwing disk errors || || [16,6] ||Disk Failure || SMART overall-health self-assessment test result: FAILED! || || [13,5] ||Disk Failure || kernel throwing disk errors || || [17,3] ||Disk Failure || kernel throwing disk errors || || [14,12] ||Pxe Halt - Locks up during execution of PXE code || '''Not Fixed''' || || [11,15] ||Network Failure || Pxe give media check failure [[BR]]] Node replaced || || [19,6] ||Pxe Halt ||Powers down during pxe || || [15,7] ||Pxe Halt ||Halts at random stages in the pxe image download process, before control in handed over to kernel || || [16,8] ||CM crash ||Power Cycled || || [20,20] ||CM crash ||CM light stays solid, Power Cycled || || [7,2] || CM crash ||Node ID light stays off, Power Cycled || || [2,20] ||CM crash ||CM light stays solid, Power Cycled || || [14,12] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [10,7] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [11,18] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [1,15] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [8,3] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [2,11] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [11,16] ||Disk Failure || Disk Write errors [[BR]] Disk replaced || || [7,8] ||Disk Failure || Bios Does not detect disk [[BR]] Disk replaced || || [18,7] ||Disk Failure || Bios Does not detect disk [[BR]] Disk replaced || || [2,17] ||Disk Failure || Bios Does not detect disk [[BR]] Disk replaced || || [5,19] ||Disk Failure || Bios Does not detect disk [[BR]] Disk replaced || || [7,2] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced|| || [12,4] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced || || [1,8] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced || || [18,18] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced || || [14,20] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced || || [9,16] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced || || [4,6] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced || || [6,8] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced || || [3,13] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced || || [5,4] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced || || [10,5] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced || || [10,8] ||Disk Failure || kernel throwing disk errors [[BR]] Disk replaced || || [8,8] ||Network Failure || Kernel throws network hardware complain during dhcp [[BR]] eth0: -- ERROR -- Class: [[BR]] Hardware failure [[BR]] Nr: 0x270 [[BR]] Msg: 2 Pair Downshift detected [[BR]] eth0: network connection up [[BR]] using port A [[BR]] speed: 100 [[BR]] autonegotiation: yes || || [12,4] ||Disk Failure || Bios Does not detect disk [[BR]] Disk replaced ||