ORBIT-USER: most of the grid does not work!

Andrea G Forte andreaf at cs.columbia.edu
Thu Feb 15 12:16:59 EST 2007


Ivan,

thank you. This might be very helpful but perhaps I need to understand 
it better. What is the difference between a node being off and being 
unavailable? Currently the grid shows only 4 nodes as unavailable but 
from my experiments there is a very large number of nodes that does not 
turn on and others that turn on but do not complete the imaging process. 
In my opinion it would be very helpful to mark all of these nodes as 
unavailable and just turn them off. In this way we would be able to 
image the good nodes with minimum effort and without having to start the 
imaging process again and again because of nodes getting stuck.
In other words, nodes that get stuck or do not turn on cause only 
problems and should be disconnected.

-Andrea


Ivan Seskar wrote:
>  
>
>   
>> From: owner-orbit-user at winlab.rutgers.edu
>>     
> [mailto:owner-orbit-user at winlab.rutgers.edu] On Behalf Of Mesut Ali
> Ergin
>   
>> Sent: Wednesday, February 14, 2007 10:38 PM
>> To: orbit-user at winlab.rutgers.edu
>> Subject: Re: ORBIT-USER: most of the grid does not work!
>>     
>
> ...
>
> Just to add to this discussion: we are having problems with node power
> supplies (the ones with the red dots on the status page actually have
> dead power supplies). Unfortunately, the first symptoms of failing PSs
> are CM lockups and nodes stuck in on or off state; it looks like we will
> have to replace all of them which is not a trivial thing to do. We are
> trying to find ways of using interim software solution that will
> (hopefully) prolong the life of power supplies as well as enable us to
> do incremental replacement (rather than force us to shut down the grid
> and replace all power supplies at once).
>
> Ivan.
>
> PS: Even better page for big grid status is
> http://www.orbit-lab.org/wiki/Status/Grid - it has a webcam feed as well
> :-). (status pages do not auto-refresh so you will have to do it
> manually - after all they are not really finished yet as you will
> discover if you try to select individual nodes).
>   




More information about the orbit-user mailing list