ORBIT-USER: Grid problem
Thierry Rakotoarivelo
Thierry.Rakotoarivelo at nicta.com.au
Thu Aug 23 21:11:14 EDT 2007
Hi Ivan,
Thanks for the info.
Regards,
Thierry.
Ivan Seskar wrote:
> Hi Thierry,
>
> As far as we can tell, there were two issues:
>
> 1.) One of the two DHCP servers was in a weird state effectively
> cutting off half of the grid.
> 2.) Control subnet switches were (at least some of them) having
> problems with rate negotiation.
>
> It is still not clear if these two were somehow related; we will keep an
> eye on it.
>
> Regards,
>
> Ivan.
>
>
> -----Original Message-----
> From: owner-orbit-user at winlab.rutgers.edu
> [mailto:owner-orbit-user at winlab.rutgers.edu] On Behalf Of Thierry
> Rakotoarivelo
> Sent: Thursday, August 23, 2007 8:05 PM
> To: orbit-user at winlab.rutgers.edu
> Subject: Re: ORBIT-USER: Grid problem
>
> Dear all,
>
> According to the recent emails, more people have experienced that "nodes
> are too slow to respond" problem, which crippled the communication
> between the nodeHandler and the multiple nodeAgents (impacting
> imageNodes4 and other experiments).
>
> At the moment, it also seems like this problem was fixed. I just
> finished an imaging process with 210 nodes correctly imaged
> (grid_2007_08_23_19_37_02).
>
> Out of curiosity and for future reference, does anyone know what did fix
> the problem we all experienced recently? (e.g. rebooting some devices,
> restarting some services,...)
>
> Regards,
> Thierry.
>
--
More information about the orbit-user
mailing list