ORBIT-USER: Error: Nodes do not come up

Vijay Subramanian subramanian.vijay at gmail.com
Tue Jul 31 05:30:03 EDT 2007


Hi,
I imaged the nodes using baseline.ndz using
imageNodes4 [1..20,1..20] baseline.ndz .

I then ran my script  using:
nodehandler4 -k scriptname

However, none of the nodes seem to be coming up even though I was able
to ping/ssh them. The nodes were reset twice and then the system gave
up on them.
Any ideas what the problem might be?

Exp id:  INFO run: Experiment grid_2007_07_31_05_02_47

I then ran the script again with a different set of nodes but it did
not help. Same error. (Exp id for this was   Experiment ID:
grid_2007_07_31_05_19_28)
All and any help is appreciated.
Thanks,
Vijay

Last few lines of output follow:

INFO stdlib: Waiting for nodes (Up/Down/Total): 0/11/11 - (still down:
n_16_2,n_16_3,n_17_1)
 INFO stdlib: Waiting for nodes (Up/Down/Total): 0/11/11 - (still
down: n_16_2,n_16_3,n_17_1)
 WARN stdlib: Giving up on node n_16_2
 WARN stdlib: Giving up on node n_16_3
 WARN stdlib: Giving up on node n_17_1
 WARN stdlib: Giving up on node n_1_8
 WARN stdlib: Giving up on node n_15_5
 WARN stdlib: Giving up on node n_20_4
 WARN stdlib: Giving up on node n_8_1
 WARN stdlib: Giving up on node n_20_3
 WARN stdlib: Giving up on node n_12_4
 WARN stdlib: Giving up on node n_18_6
 WARN stdlib: Giving up on node n_11_7
 INFO stdlib: Waiting for nodes (Up/Down/Total): 0/11/11 - (still
down: n_16_2,n_16_3,n_17_1)
 INFO whenAll: *: 'apps/app/status[@value='INSTALLED.OK']' fires
starting soon
 INFO OML: Started: {"port"=>"7600", "iface"=>"eth2", "addr"=>"224.0.0.6"}
 INFO Experiment: DONE!
 INFO ExecApp: Application 'commServer' finished
 INFO run: Experiment grid_2007_07_31_05_02_47 finished after 10:28



-- 
Networks Lab, RPI
http://poisson.ecse.rpi.edu/~vijay



More information about the orbit-user mailing list