ORBIT-USER: problems loading image in grid

Vanessa Frias-martinez vf2001 at cs.columbia.edu
Wed Oct 25 10:16:31 EDT 2006


for a 1..15,1..15 grid, i get a similar error in node (4,1)...is it normal
to have problems in multiple nodes when loading images?

thanks!

vanessa

vanessa at console.grid:~/barterExec$ imageNodes 1..15,1..15
tmp/node-1-2-2006-10-19-18-52-55.ndz..2,1..2 tmp/node-1-2-2006-10-19-1
Imaging nodes: 1..15,1..15 with image tmp/node-1-2-2006-10-19-18-52-55.ndz
Using config /etc/nodehandler/grid.cfg
/etc/nodehandler/grid.cfg:20: warning: Insecure world writable dir /tmp,
mode 040777
Using logfile /etc/nodehandler/nodehandler_log.xml
 INFO init: NodeHandler Version 3.6.4-1 (849)
 INFO init: Experiment ID: grid_2006_10_25_10_10_03
 INFO Experiment: load system:exp:stdlib
 INFO prop.resetDelay: resetDelay = 180:Fixnum
 INFO Experiment: load system:exp:imageNode
 INFO prop.nodes: nodes = [1..15, 1..15]:Array
 INFO prop.image: image = "tmp/node-1-2-2006-10-19-18-52-55.ndz":String
 INFO stdlib: 225 out of 225 node(s) still down n_1_8,n_3_10,n_12_15
 INFO stdlib: 225 out of 225 node(s) still down n_1_8,n_3_10,n_12_15
/tmp/eee.424/lib/util/communication.rb:127: warning: Insecure world
writable dir /tmp, mode 040777
FATAL run: ServiceException: ServiceException
        Node (4,1) Not Registered for Testbed:
'#<CMC::Testbed:0xa7ad1abc>'
 INFO run: Experiment grid_2006_10_25_10_10_03 finished after 0:24
 done.


On Wed, 25 Oct 2006, Haris Kremo wrote:

> seems to me that node 1-16 creates problems,
>
> please try imaging without that node
>
> H.
>
> On 10/25/06, Vanessa Frias-martinez <vf2001 at cs.columbia.edu> wrote:
> >
> > Hi!
> >
> > I tried to load an image in the grid and i get an error message.
> > apparently 353 are down? any clue?
> >
> > thanks!
> >
> >  INFO n_20_1: Checked in as /ip/10.10.20.1 booting off pxe:1.1.4
> >  INFO n_20_14: Checked in as /ip/10.10.20.14 booting off pxe:1.1.4
> > FATAL run: ServiceException: ServiceException
> >         Node (1,16) Not Registered for Testbed:
> > '#<CMC::Testbed:0xa7ad1abc>'
> >  INFO stdlib: 353 out of 400 node(s) still down n_17_4,n_8_3,n_19_3
> >  INFO run: Experiment grid_2006_10_25_09_53_47 finished after 0:50
> >  done.
> >
> >
>



More information about the orbit-user mailing list