ORBIT-USER: most of the grid does not work!

Ivan Seskar Seskar at winlab.rutgers.edu
Fri Feb 16 13:48:14 EST 2007


Hi Andrea,

If everybody agrees, we can try that. The problem is that people who
want to have as many nodes as possible even if they are not 100%
guaranteed to come up lose big time (once we declare nodes as
"administratively down" you can't access them at all). What do others
think about it?

Ivan.

-----Original Message-----
From: owner-orbit-user at winlab.rutgers.edu
[mailto:owner-orbit-user at winlab.rutgers.edu] On Behalf Of Andrea G Forte
Sent: Thursday, February 15, 2007 12:17 PM
To: orbit-user at winlab.rutgers.edu
Subject: Re: ORBIT-USER: most of the grid does not work!

Ivan,

thank you. This might be very helpful but perhaps I need to understand
it better. What is the difference between a node being off and being
unavailable? Currently the grid shows only 4 nodes as unavailable but
from my experiments there is a very large number of nodes that does not
turn on and others that turn on but do not complete the imaging process.

In my opinion it would be very helpful to mark all of these nodes as
unavailable and just turn them off. In this way we would be able to
image the good nodes with minimum effort and without having to start the
imaging process again and again because of nodes getting stuck.
In other words, nodes that get stuck or do not turn on cause only
problems and should be disconnected.

-Andrea


Ivan Seskar wrote:
>  
>
>   
>> From: owner-orbit-user at winlab.rutgers.edu
>>     
> [mailto:owner-orbit-user at winlab.rutgers.edu] On Behalf Of Mesut Ali
> Ergin
>   
>> Sent: Wednesday, February 14, 2007 10:38 PM
>> To: orbit-user at winlab.rutgers.edu
>> Subject: Re: ORBIT-USER: most of the grid does not work!
>>     
>
> ...
>
> Just to add to this discussion: we are having problems with node power
> supplies (the ones with the red dots on the status page actually have
> dead power supplies). Unfortunately, the first symptoms of failing PSs
> are CM lockups and nodes stuck in on or off state; it looks like we
will
> have to replace all of them which is not a trivial thing to do. We are
> trying to find ways of using interim software solution that will
> (hopefully) prolong the life of power supplies as well as enable us to
> do incremental replacement (rather than force us to shut down the grid
> and replace all power supplies at once).
>
> Ivan.
>
> PS: Even better page for big grid status is
> http://www.orbit-lab.org/wiki/Status/Grid - it has a webcam feed as
well
> :-). (status pages do not auto-refresh so you will have to do it
> manually - after all they are not really finished yet as you will
> discover if you try to select individual nodes).
>   





More information about the orbit-user mailing list