ORBIT-USER: Error: Nodes do not come up

Vijay Subramanian subramanian.vijay at gmail.com
Tue Jul 31 06:14:27 EDT 2007


Thanks. I should have realized that.

I tried baseline-2.3 and got much further. However,
http://www.orbit-lab.org/wiki/Documentation/SupportedImages seems to state that
"The latest baseline is available through
baseline.ndz ". Probably, this needs to be changed.

While the nodes came up (well 9 out of 11 did), I ran into TCP connect
errors. I am trying to
create multiple TCP flows. The experiment gives the following error:
The experiment id is
Experiment grid_2007_07_31_05_52_52 finished

Previously, on this error, I tried changing the TCP port and the node
set but got the same error. I see some parse errors below. Is this a
problem with my script or is there something deeper here?

All help is highly appreciated.
Thanks,
Vijay

---snip--
ERROR NodeApp: Exception:Connect to TCP server failed
ERROR NodeApp: ERROR(oml) Transport: Unable to load file (error #4:
Internal Expat parser error)
ERROR NodeApp: ERROR(oml) Transport: Expat error #3 (line 1, column
0): no element found
ERROR NodeApp: ERROR(oml) Transport: Error while parsing configuration
/tmp/52e2acd884421767988729fb2c4ed710.xml
ERROR NodeApp: ERROR(oml) Unable to load file (error #4: Internal
Expat parser error)
ERROR NodeApp: ERROR(oml) Expat error #3 (line 1, column 0): no element found
ERROR NodeApp: ERROR(oml) Error while parsing configuration
'/tmp/52e2acd884421767988729fb2c4ed710.xml'
ERROR NodeApp: connect: Connection refused
ERROR NodeApp: Exception:Connect to TCP server failed
ERROR NodeApp: connect: Connection refused
ERROR NodeApp: Exception:Connect to TCP server failed
ERROR NodeApp: connect: Connection refused
ERROR NodeApp: Exception:Connect to TCP server failed
ERROR NodeApp: connect: Connection refused
ERROR NodeApp: Exception:Connect to TCP server failed
ERROR NodeApp: connect: Connection refused
ERROR NodeApp: Exception:Connect to TCP server failed
ERROR NodeApp: connect: Connection refused
ERROR NodeApp: Exception:Connect to TCP server failed
 INFO Experiment: DONE!
 INFO ExecApp: Application 'commServer' finished

--snip--

On 31/07/07, Thierry Rakotoarivelo <Thierry.Rakotoarivelo at nicta.com.au> wrote:
> Hi Vijay,
>
> The image "baseline.ndz" points to an image that does NOT contains
> nodeAgent4. Therefore there was no one to reply to the nodeHandler4 in
> your attempt to run the experiment script.
>
> In order to use nodeHandler4, you need a nodeAgent4 running on each of
> your nodes.
>
> Please re0image your nodes with either "baseline-2.2.ndz" or
> "baseline-2.3.ndz".
>
> Regards,
> Thierry.
>
> Vijay Subramanian wrote:
> > Hi,
> > I imaged the nodes using baseline.ndz using
> > imageNodes4 [1..20,1..20] baseline.ndz .
> >
> > I then ran my script  using:
> > nodehandler4 -k scriptname
> >
> > However, none of the nodes seem to be coming up even though I was able
> > to ping/ssh them. The nodes were reset twice and then the system gave
> > up on them.
> > Any ideas what the problem might be?
> >
> > Exp id:  INFO run: Experiment grid_2007_07_31_05_02_47
> >
> > I then ran the script again with a different set of nodes but it did
> > not help. Same error. (Exp id for this was   Experiment ID:
> > grid_2007_07_31_05_19_28)
> > All and any help is appreciated.
> > Thanks,
> > Vijay
> >
> > Last few lines of output follow:
> >
> > INFO stdlib: Waiting for nodes (Up/Down/Total): 0/11/11 - (still down:
> > n_16_2,n_16_3,n_17_1)
> >  INFO stdlib: Waiting for nodes (Up/Down/Total): 0/11/11 - (still
> > down: n_16_2,n_16_3,n_17_1)
> >  WARN stdlib: Giving up on node n_16_2
> >  WARN stdlib: Giving up on node n_16_3
> >  WARN stdlib: Giving up on node n_17_1
> >  WARN stdlib: Giving up on node n_1_8
> >  WARN stdlib: Giving up on node n_15_5
> >  WARN stdlib: Giving up on node n_20_4
> >  WARN stdlib: Giving up on node n_8_1
> >  WARN stdlib: Giving up on node n_20_3
> >  WARN stdlib: Giving up on node n_12_4
> >  WARN stdlib: Giving up on node n_18_6
> >  WARN stdlib: Giving up on node n_11_7
> >  INFO stdlib: Waiting for nodes (Up/Down/Total): 0/11/11 - (still
> > down: n_16_2,n_16_3,n_17_1)
> >  INFO whenAll: *: 'apps/app/status[@value='INSTALLED.OK']' fires
> > starting soon
> >  INFO OML: Started: {"port"=>"7600", "iface"=>"eth2", "addr"=>"224.0.0.6"}
> >  INFO Experiment: DONE!
> >  INFO ExecApp: Application 'commServer' finished
> >  INFO run: Experiment grid_2007_07_31_05_02_47 finished after 10:28
> >
> >
> >
>
> --
> -----
> Thierry Rakotoarivelo
> Networks and Pervasive Computing Group (NPC) - NICTA
> Locked Bag 9013, Alexandria, NSW 1435, Australia
> Tel. +61 2 8374 5245 / Fax. +61 2 8374 5531
> Web. www.nicta.com.au
>


-- 
Networks Lab, RPI
http://poisson.ecse.rpi.edu/~vijay



More information about the orbit-user mailing list