ORBIT-USER: most of the grid does not work!
Andrea G Forte
andreaf at cs.columbia.edu
Thu Feb 15 12:16:59 EST 2007
Ivan,
thank you. This might be very helpful but perhaps I need to understand
it better. What is the difference between a node being off and being
unavailable? Currently the grid shows only 4 nodes as unavailable but
from my experiments there is a very large number of nodes that does not
turn on and others that turn on but do not complete the imaging process.
In my opinion it would be very helpful to mark all of these nodes as
unavailable and just turn them off. In this way we would be able to
image the good nodes with minimum effort and without having to start the
imaging process again and again because of nodes getting stuck.
In other words, nodes that get stuck or do not turn on cause only
problems and should be disconnected.
-Andrea
Ivan Seskar wrote:
>
>
>
>> From: owner-orbit-user at winlab.rutgers.edu
>>
> [mailto:owner-orbit-user at winlab.rutgers.edu] On Behalf Of Mesut Ali
> Ergin
>
>> Sent: Wednesday, February 14, 2007 10:38 PM
>> To: orbit-user at winlab.rutgers.edu
>> Subject: Re: ORBIT-USER: most of the grid does not work!
>>
>
> ...
>
> Just to add to this discussion: we are having problems with node power
> supplies (the ones with the red dots on the status page actually have
> dead power supplies). Unfortunately, the first symptoms of failing PSs
> are CM lockups and nodes stuck in on or off state; it looks like we will
> have to replace all of them which is not a trivial thing to do. We are
> trying to find ways of using interim software solution that will
> (hopefully) prolong the life of power supplies as well as enable us to
> do incremental replacement (rather than force us to shut down the grid
> and replace all power supplies at once).
>
> Ivan.
>
> PS: Even better page for big grid status is
> http://www.orbit-lab.org/wiki/Status/Grid - it has a webcam feed as well
> :-). (status pages do not auto-refresh so you will have to do it
> manually - after all they are not really finished yet as you will
> discover if you try to select individual nodes).
>
More information about the orbit-user
mailing list