| Version 25 (modified by , 17 years ago) ( diff ) |
|---|
List of Node Failures
| Node | Failure Mode | Solution / Notes |
| [1,5] | Pxe Halt - Locks up during execution of PXE code | Multiple resets (more than 1) may be required Might require node Change |
| [1,5] | Dead Node ID box top LED (the blinking one) | Power cycle Fixed it Rabbit Issue? |
| [3,8] | First Power on Halt | Locks during the first attempt Post after reset |
| [17,4] | First Power on Halt | Locks during the first attempt no serial console output |
| [1,14] | First Power on Halt | Locks during the first attempt Reset Fixes it has new disk |
| [20,19] | Disk Failure | Kernel Throws errors during imageing Disk Changed |
| [12,9] | Disk Controller Failure | Disk controller was having issues, disks were being incorrectly recognised |
| [3,18] | Disk Failure | Disk Write errors Disk replaced |
| [5,11] | Disk Failure | Disk Write errors Disk replaced |
| [14,11] | Disk Failure | Disk Write errors Disk replaced |
| [13,5] | Lock Up | Rabbit and Node were halted Power cycled |
| [4,11] | Disk Failure | Disk Write errors Disk replaced |
| [5,9] | Disk Failure | Disk Write errors Disk replaced |
| [9,11] | Disk Failure | Disk Write errors Disk replaced |
| [3,19] | Bad Node | Mother board Failure, refused to boot Replaced |
| [14,8] | Disk Failure | Kernel Throws Disk Errors Disk Changed |
| [17,9] | Disk Failure | Disk write halts, imaging times out Disk replaced |
| [18,3] | Over heat | CM measures internal temp at 106F, fails to boot reliably |
| [20,2] | Disk Failure | Disk Write errors Disk replaced |
| [8,13] | Disk Failure | Disk Write errors Disk replaced |
| [9,10] | Disk Failure | Disk Write errors Disk replaced |
| [5,2] | Disk Failure | Disk Write errors Disk replaced |
| [17,13] | Disk Failure | Disk Write errors Disk replaced |
| [12,1] | Disk Failure | Disk Write errors Disk replaced |
| [6,14] | Disk Failure | Disk Write errors Disk replaced |
| [17,19] | Memory Failure | Memory Pins did not make proper contact, Bent case and reinserted memory |
| [7,2] | Disk Failure | Disk Write errors Disk replaced |
| [5,15] | Lock Up | Rabbit and Node were halted, node ID box LED was solid Power cycled |
| [7,2] | Lock Up | Rabbit and Node were halted, node ID box led was off Power cycled |
| [16,1] | Lock Up | Rabbit and Node were halted Power cycled |
| [1,9] | Intermitten failure | Power cycled |
| [1,5] | Disk Failure | Failing disk caused disk controller to fail Cm had issues also, both replaced |
| [9,4] | Disk Failure | Failing disk caused disk controller to fail Cm had issues also, both replaced |
| [15,6] | Disk Failure | Disk Write errors Disk replaced |
| [18,16] | Disk Failure | Disk Write errors Disk replaced |
| [3,11] | Disk Failure | Disk Write errors Disk replaced |
| [16,19] | Disk Failure | Disk Write errors Disk replaced |
| [5,17] | Disk Failure | Disk Write errors Disk replaced |
| [20,4] | Node Failure | Node was replaced |
| [15,4] | Node Failure | Node was replaced, bad left antenna connector. Replacement was used |
| [5,14] | Overheat | Fan was not plugged in |
| [17,4] | Disk Failure | Smartctl reports impending disk death |
| [9,9] | Memory Failure | Memory Pins did not make proper contact, Bent case and reinserted memory |
| [11,4] | Disk Failure | Disk Write errors Disk replaced |
| [12,7] | Disk Failure | Disk Write errors Disk replaced |
| [13,2] | Disk Failure | Successfully booted from disk, but kernel was throwing disk errors |
| [16,6] | Disk Failure | SMART overall-health self-assessment test result: FAILED! |
| [13,5] | Disk Failure | kernel throwing disk errors |
| [17,3] | Disk Failure | kernel throwing disk errors |
| [14,12] | Pxe Halt - Locks up during execution of PXE code | Not Fixed |
| [11,15] | Network Failure | Pxe give media check failure ] Node replaced |
| [19,6] | Pxe Halt | Powers down during pxe |
| [15,7] | Pxe Halt | Halts at random stages in the pxe image download process, before control in handed over to kernel |
| [16,8] | CM crash | Power Cycled |
| [20,20] | CM crash | CM light stays solid, Power Cycled |
| [7,2] | CM crash | Node ID light stays off, Power Cycled |
| [2,20] | CM crash | CM light stays solid, Power Cycled |
| [14,12] | Disk Failure | Disk Write errors Disk replaced |
| [10,7] | Disk Failure | Disk Write errors Disk replaced |
| [11,18] | Disk Failure | Disk Write errors Disk replaced |
| [1,15] | Disk Failure | Disk Write errors Disk replaced |
| [8,3] | Disk Failure | Disk Write errors Disk replaced |
| [2,11] | Disk Failure | Disk Write errors Disk replaced |
| [11,16] | Disk Failure | Disk Write errors Disk replaced |
| [7,8] | Disk Failure | Bios Does not detect disk Disk replaced |
| [18,7] | Disk Failure | Bios Does not detect disk Disk replaced |
| [2,17] | Disk Failure | Bios Does not detect disk Disk replaced |
| [5,19] | Disk Failure | Bios Does not detect disk Disk replaced |
| [7,2] | Disk Failure | kernel throwing disk errors Disk replaced |
| [12,4] | Disk Failure | kernel throwing disk errors Disk replaced |
| [1,8] | Disk Failure | kernel throwing disk errors Disk replaced |
| [18,18] | Disk Failure | kernel throwing disk errors Disk replaced |
| [14,20] | Disk Failure | kernel throwing disk errors Disk replaced |
| [9,16] | Disk Failure | kernel throwing disk errors Disk replaced |
| [4,6] | Disk Failure | kernel throwing disk errors Disk replaced |
| [6,8] | Disk Failure | kernel throwing disk errors Disk replaced |
| [3,13] | Disk Failure | kernel throwing disk errors Disk replaced |
| [5,4] | Disk Failure | kernel throwing disk errors Disk replaced |
| [10,5] | Disk Failure | kernel throwing disk errors Disk replaced |
| [10,8] | Disk Failure | kernel throwing disk errors Disk replaced |
| [8,8] | Network Failure | Kernel throws network hardware complain during dhcp |
| [12,4] | Disk Failure | Bios Does not detect disk Disk replaced |
| [8,10] | Disk Failure | Bios Does not detect disk Disk replaced |
| [15,17] | Disk Failure | Bios Does not detect disk Disk replaced |
| [10,2] | Disk Failure | Bios Does not detect disk Disk replaced |
| [1,6] | Disk Failure | kernel throwing disk errors Disk replaced hda: dma_timer_expiry: dma status == 0x21 |
| [18,12] | Disk Failure | kernel throwing disk errors Disk replaced hda: dma_timer_expiry: dma status == 0x21 |
| [1,10] | Disk Failure | kernel throwing disk errors Disk replaced |
| [13,8] | Disk Failure | kernel throwing disk errors Disk replaced |
| [12,12] | Disk Failure | kernel throwing disk errors Disk replaced |
| [8,8] | Disk Failure | kernel throwing disk errors Disk replaced |
| [2,3] | Disk Failure | kernel throwing disk errors Disk replaced |
| [2,14] | Disk Failure | kernel throwing disk errors Disk replaced |
| [13,17] | Disk Failure | kernel throwing disk errors Disk replaced |
| [16,17] | Disk Failure | kernel throwing disk errors Disk replaced |
| [1,2] | Node Failure | Can't isolate Problem: Seems to over heat and kernel panic |
Note:
See TracWiki
for help on using the wiki.
