ORBIT-USER: Grid problem

Thierry Rakotoarivelo Thierry.Rakotoarivelo at nicta.com.au
Thu Aug 23 21:11:14 EDT 2007


Hi Ivan,

Thanks for the info.

Regards,
Thierry.

Ivan Seskar wrote:
> Hi Thierry,
> 
> As far as we can tell, there were two issues:
> 
>   1.) One of the two DHCP servers was in a weird state effectively
> cutting off half of the grid. 
>   2.) Control subnet switches were (at least some of them) having
> problems with rate negotiation.
> 
> It is still not clear if these two were somehow related; we will keep an
> eye on it.
> 
> Regards,
> 
> Ivan.
> 
> 
> -----Original Message-----
> From: owner-orbit-user at winlab.rutgers.edu
> [mailto:owner-orbit-user at winlab.rutgers.edu] On Behalf Of Thierry
> Rakotoarivelo
> Sent: Thursday, August 23, 2007 8:05 PM
> To: orbit-user at winlab.rutgers.edu
> Subject: Re: ORBIT-USER: Grid problem
> 
> Dear all,
> 
> According to the recent emails, more people have experienced that "nodes
>   are too slow to respond" problem, which crippled the communication
> between the nodeHandler and the multiple nodeAgents (impacting
> imageNodes4 and other experiments).
> 
> At the moment, it also seems like this problem was fixed. I just
> finished an imaging process with 210 nodes correctly imaged
> (grid_2007_08_23_19_37_02).
> 
> Out of curiosity and for future reference, does anyone know what did fix
> the problem we all experienced recently? (e.g. rebooting some devices,
> restarting some services,...)
> 
> Regards,
> Thierry.
> 
-- 



More information about the orbit-user mailing list