[orbit-user] Problem with imaging outdoor grid nodes?

Kishore Ramachandran kishore at winlab.rutgers.edu
Thu Jan 31 12:52:05 EST 2008


Hi Suhas:

Another rule of thumb that I found useful is to check whether the same set
of nodes are reported missing each time. If thats the case, chances are that
they are actually physically not present or have hardware issues that
administrators need to fix.

Most of the time though, the "missing node" message comes due to issues with
the CMC/CM service that deals with turning nodes ON/OFF and the provision of
a serial console. In this case, the problem is temporary and goes away the
next time you try to image the same set of nodes (albeit with a small time
lag, say 2 minutes). My guess is that ORBIT admin are aware of this problem
and a fix will soon make this issue go away. For now, I suggest trying to
explicitly image the "missing nodes" once the remaining nodes are imaged.

regards,
Kishore

On Jan 31, 2008 11:16 AM, Mesut Ali Ergin <ergin at eden.rutgers.edu> wrote:

> Suhas,
>
> A good practice is to take a look at node status page --
> http://orbit-lab.org/wiki/Status , for the grid/sb you want to work on
> and check that the ones you select to image are gray colored dots before
> attempting to start imaging procedure. A red colored dot indicates that
> the node is temporarily unavailable (may be due to a number of reasons)
> and will be serviced soon by the ORBIT team to bring it back to life.
> Red ones will be reported as missing by the nodeHandler, and they are
> not usually missing physically :) Hope it helps,
>
> Best,
>
> --
> MAE
>
>
>
>  suhas mathur wrote:
> > Ali,
> >
> > Thanks. Can you also tell me why sometimes, during the imaging process
> > the verbose output declares that some nodes are missing? For example,
> > in my latest imaging attempt (10:45 am, Thurs, Jan 31), I got the
> > following message:
> >
> > "
> >  INFO prop.timeout: timeout = 800:Fixnum
> >  WARN -:topo:image: Ignoring missing node '1 at 6'
> >  WARN -:topo:image: Ignoring missing node '1 at 7'
> > "
> >
> > Or is this message sent erroneously sometimes, when the nodes are
> > infact not missing?
> >
> >
> > Thanks,
> > Suhas
> >
> >
> > On Jan 30, 2008 9:12 PM, Mesut Ali Ergin <ergin at eden.rutgers.edu> wrote:
> >> Suhas,
> >>
> >> I've fixed one thing related to imaging bandwidth this afternoon after
> >> Sanjit let me know about imaging problems. If your imaging experience
> >> was from before or around  3pm today, then I would suggest another
> >> try. So far as not being able to reach a node after successfull
> >> imaging is concerned, it is not usual. Try with the latest baseline
> >> image and report if you could reproduce the problem.
> >>
> >> Best,
> >>
> >> --
> >> MAE
> >>
> >>
> >>
> >>
> >> On 1/30/08, suhas mathur <suhas at winlab.rutgers.edu> wrote:
> >>> Hi all-
> >>>
> >>> I am a new user on ORBIT and I have been having problems with imaging
> >>> of nodes on the outdoor grid. Sometimes, the nodes timeout and
> >>> sometime they fail. Sometimes, some of them time out and others fail.
> >>> Has anyone else experienced similar problems on the outdoor nodes or
> >>> am I missing something? I have gotten things to work at least once
> >>> without any problems.
> >>>
> >>> The image I am using is baseline-2.3.ndz
> >>>
> >>> The imaging command I am using is "omf load
> >>> [[1,1],[1,2],[1,3],[1,4],[1,5],[1,6],[1,7],[1,8],[1,9],[1,10]]
> >>> baseline-2.3.ndz"
> >>>
> >>> The command I am using to turn them on is "omf tell on all"
> >>>
> >>> Also, sometimes after 'supposedly successful imaging", I can ping the
> >>> nodes but I cannot ssh to them - the error I get is: "no path to host
> >>> node1-x'.
> >>>
> >>> Finally, is there a way to tell, from the verbose output of the
> >>> imaging process, which node(s) have been imaged so one can begin
> >>> working with them while others continue to get imaged?
> >>>
> >>> Thanks,
> >>> Suhas
> >>> _______________________________________________
> >>> orbit-user mailing list
> >>> orbit-user at orbit-lab.org
> >>> http://orbit-lab.org/cgi-bin/mailman/listinfo/orbit-user
> >>>
> >>
> >> --
> >> Mesut Ali Ergin
> >> ergin at winlab.rutgers.edu
> >>
> >> Rutgers University, WINLAB,
> >> Technology Centre of New Jersey,
> >> 671 Rt. 1 South, North Brunswick,
> >> New Jersey, 08902-3390, USA
> >>
> >> Phone: 862-368-6620
> >> Fax:   732-932-6882
> >>
> >> _______________________________________________
> >> orbit-user mailing list
> >> orbit-user at orbit-lab.org
> >> http://orbit-lab.org/cgi-bin/mailman/listinfo/orbit-user
> >>
> >
> >
> >
>
>
> --
> Mesut Ali Ergin
> ergin at winlab.rutgers.edu
>
> Rutgers University, WINLAB,
> Technology Centre of New Jersey,
> 671 Rt. 1 South, North Brunswick,
> New Jersey, 08902-3390, USA
>
> Phone: 862-368-6620
> Fax:   732-932-6882
> _______________________________________________
> orbit-user mailing list
> orbit-user at orbit-lab.org
> http://orbit-lab.org/cgi-bin/mailman/listinfo/orbit-user
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://orbit-lab.org/pipermail/orbit-user/attachments/20080131/8e9fe405/attachment.htm 


More information about the orbit-user mailing list