ORBIT-USER: Sandbox 2: One node down
Reza Lotun
rlotun at cs.ubc.ca
Mon Aug 20 18:54:38 EDT 2007
Hello All,
When I issue the command:
imageNodes4 1,1..2 baseline.ndz
it appears that one of the nodes (node n_1_2) fails to start the PXE
boot and image itself. Here is the output:
Imaging nodes: '1,1..2' with image 'baseline.ndz' on default domain
(retrieved from hostname)
INFO init: NodeHandler Version 4.2.0 (1272)
INFO init: Experiment ID: sb2_2007_08_20_18_41_34
INFO ExecApp: Starting application 'commServer':
/opt/nodehandler4-4.2.0/sbin/commServer --logfile
/tmp/commServer-sb2_2007_08_20_18_41_34.log -d 4 --iface eth1 -e
INFO Experiment: load system:exp:stdlib
INFO prop.resetDelay: resetDelay = 210:Fixnum
INFO prop.resetTries: resetTries = 1:Fixnum
INFO Experiment: load system:exp:imageNode
INFO prop.nodes: nodes = [1, 1..2]:Array
INFO prop.image: image = "baseline.ndz":String
INFO prop.pxe: pxe = "1.2.1-omf":String
INFO prop.domain: domain = nil:NilClass
INFO prop.timeout: timeout = 800:Fixnum
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Resetting node n_1_1
INFO stdlib: Resetting node n_1_2
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down:
n_1_1,n_1_2)
INFO exp: Progress(0/0/2): 0/0/0 min(n_1_1)/avg/max (90) - Timeout: 709 sec.
INFO whenAll: *: 'status[@value='UP']' fires
INFO exp: Progress(0/0/2): 0/0/0 min(n_1_1)/avg/max (90) - Timeout: 699 sec.
INFO exp: Progress(0/0/2): 0/0/0 min(n_1_1)/avg/max (90) - Timeout: 689 sec.
INFO exp: Progress(0/0/2): 0/0/0 min(n_1_1)/avg/max (90) - Timeout: 679 sec.
INFO exp: Progress(0/0/2): 0/0/0 min(n_1_1)/avg/max (90) - Timeout: 669 sec.
INFO exp: Progress(0/0/2): 0/0/0 min(n_1_1)/avg/max (90) - Timeout: 659 sec.
INFO exp: Progress(0/0/2): 0/0/0 min(n_1_1)/avg/max (90) - Timeout: 649 sec.
INFO exp: Progress(0/0/2): 10/10/10 min(n_1_1)/avg/max (90) - Timeout: 639 sec.
INFO exp: Progress(0/0/2): 10/10/10 min(n_1_1)/avg/max (90) - Timeout: 629 sec.
INFO exp: Progress(0/0/2): 10/10/10 min(n_1_1)/avg/max (90) - Timeout: 619 sec.
INFO exp: Progress(0/0/2): 20/20/20 min(n_1_1)/avg/max (90) - Timeout: 609 sec.
INFO exp: Progress(0/0/2): 20/20/20 min(n_1_1)/avg/max (90) - Timeout: 599 sec.
INFO exp: Progress(0/0/2): 30/30/30 min(n_1_1)/avg/max (90) - Timeout: 589 sec.
INFO exp: Progress(0/0/2): 30/30/30 min(n_1_1)/avg/max (90) - Timeout: 579 sec.
INFO exp: Progress(0/0/2): 30/30/30 min(n_1_1)/avg/max (90) - Timeout: 569 sec.
INFO exp: Progress(0/0/2): 40/40/40 min(n_1_1)/avg/max (90) - Timeout: 559 sec.
INFO exp: Progress(0/0/2): 40/40/40 min(n_1_1)/avg/max (90) - Timeout: 549 sec.
INFO exp: Progress(0/0/2): 40/40/40 min(n_1_1)/avg/max (90) - Timeout: 539 sec.
INFO exp: Progress(0/0/2): 50/50/50 min(n_1_1)/avg/max (90) - Timeout: 529 sec.
INFO exp: Progress(0/0/2): 50/50/50 min(n_1_1)/avg/max (90) - Timeout: 518 sec.
INFO exp: Progress(0/0/2): 50/50/50 min(n_1_1)/avg/max (90) - Timeout: 508 sec.
INFO exp: Progress(1/1/2): 60/60/60 min(n_1_1)/avg/max (90) - Timeout: 498 sec.
INFO exp: Progress(1/1/2): 60/60/60 min(n_1_1)/avg/max (90) - Timeout: 488 sec.
INFO exp: Progress(1/1/2): 60/65/70 min(n_1_1)/avg/max (90) - Timeout: 478 sec.
INFO exp: Progress(1/1/2): 60/65/70 min(n_1_1)/avg/max (90) - Timeout: 468 sec.
INFO exp: Progress(1/1/2): 60/70/80 min(n_1_1)/avg/max (90) - Timeout: 458 sec.
INFO exp: Progress(1/1/2): 60/70/80 min(n_1_1)/avg/max (90) - Timeout: 448 sec.
INFO exp: Progress(1/1/2): 60/75/90 min(n_1_1)/avg/max (90) - Timeout: 438 sec.
INFO exp: Progress(1/1/2): 60/75/90 min(n_1_1)/avg/max (90) - Timeout: 428 sec.
INFO exp: Progress(2/1/2): 60/80/100 min(n_1_1)/avg/max (90) -
Timeout: 418 sec.
INFO exp: -----------------------------
INFO exp: Imaging Process Done
INFO exp: - 1 node(s) failed - See the topology file: 'topo_sb2_failed.rb'
INFO exp: - 1 node(s) succesfully imaged - See the topology file:
'topo_sb2_active.rb'
INFO exp: -----------------------------
INFO Experiment: DONE!
INFO ExecApp: Application 'commServer' finished
INFO run: Experiment sb2_2007_08_20_18_41_34 finished after 10:7
Cheers,
Reza
More information about the orbit-user
mailing list