ORBIT-USER: error when imaging nodes

Thierry Rakotoarivelo Thierry.Rakotoarivelo at nicta.com.au
Sun Jul 29 19:59:56 EDT 2007


Dear all,

I re-restarted the CMC gridservice, and imaging seems to work now.

 From the result of a 'ps' command, it seems like there was multiple CMC 
gridservices that were still running on the machine.

There's only one now, and I quickly tested the imageNode4, which seems 
to work fine.

Regards,
Thierry.

Vijay Subramanian wrote:
> Hi,
> Thanks for the prompt attention but I am still getting the same error
> for ImageNodes4.
> I tried imageNodes which got a bit further but still failed as shown below.
> 
> Any ideas on how to resolve this? While we are on this, can someone
> clarify the difference between imageNodes and imageNodes4?
> 
> Thanks for any pointers.
> Vijay
> 
> vijay at console.grid:~$ imageNodes [1..10,1..10] baseline.ndz
> Imaging nodes: [1..10,1..10] with image baseline.ndz
> Using config /etc/nodehandler/grid.cfg
> /etc/nodehandler/grid.cfg:20: warning: Insecure world writable dir
> /tmp, mode 040777
> Using logfile /etc/nodehandler/nodehandler_log.xml
>  INFO init: NodeHandler Version 3.6.4-1 (849)
>  INFO init: Experiment ID: grid_2007_07_29_18_37_50
>  INFO Experiment: load system:exp:stdlib
>  INFO prop.resetDelay: resetDelay = 180:Fixnum
>  INFO Experiment: load system:exp:imageNode
>  INFO prop.nodes: nodes = [[1..10, 1..10]]:Array
>  INFO prop.image: image = "baseline.ndz":String
> /tmp/eee.466/lib/util/communication.rb:127: warning: Insecure world
> writable dir /tmp, mode 040777
>  INFO stdlib: 100 out of 100 node(s) still down n_10_6,n_9_5,n_7_9
>  INFO n_6_9: Checked in as /ip/10.10.6.9 booting off baseline:1.0.9
>  WARN n_6_9: Expected image 'pxe:1.1.4', but node reported 'baseline:1.0.9'.
> FATAL run: ServiceException: ServiceException
>         No testbed known for domain '50.10.10'
>  INFO n_6_9: Resseting node
> ERROR comm: While processing command '/ip/10.10.6.9 0 WHOAMI 3.4.3
> baseline:1.0.9' Error: 'ServiceException'
> /tmp/eee.466/app/nodeHandler.rb:389:in `service_call':
> ServiceException (ServiceException)
>         from /tmp/eee.466/lib/handler/cmc.rb:38:in `nodeOff'
>         from /tmp/eee.466/lib/handler/cmc.rb:51:in `nodeAllOff'
>         from /tmp/eee.466/app/nodeHandler.rb:831:in `shutdown'
>         from /tmp/eee.466/app/nodeHandler.rb:1011
>  done.
> 
> 
> On 29/07/07, Joseph F. Miklojcik III <jfm3 at winlab.rutgers.edu> wrote:
>> On Sun, 2007-07-29 at 18:12 -0400, Vijay Subramanian wrote:
>>> I am getting an error similar to that seen by Andrea earlier. It looks
>>> like the CM service needs to be restarted. Can someone look into this
>>> and do the needful ?
>> I have restarted the CMC.  Hope it helps!
>>
>> (jfm3)
>>
>>
> 
> 

-- 
-----
Thierry Rakotoarivelo
Networks and Pervasive Computing Group (NPC) - NICTA
Locked Bag 9013, Alexandria, NSW 1435, Australia
Tel. +61 2 8374 5245 / Fax. +61 2 8374 5531
Web. www.nicta.com.au



More information about the orbit-user mailing list