ORBIT-USER: imaging of nodes.
Max Ott
max at winlab.rutgers.edu
Mon Jan 1 22:20:38 EST 2007
Hi,
Unfortunately you typed something slightly different. You requested
'basline.ndz' which doesn't exist. Now doing that should result in
something a bit more meaningful than a ServiceException with no real
consequences.
As for the duration of imaging. It should really never take more than
20 to 25 minutes to image the entire grid. I just did it and indeed it
got stuck on one node, but the INFO messages were very clear about it:
INFO exp: Progress(322/0/323): 10/99/100 min(n_17_19)/avg/max (151.971942)
322 of 323 nodes were imaged. The one which didn't make it past the
10% mark was 'n_17_19'.
If you don't need that one, simple ^C out of it. I guess we should add
some more sanity checking code in to do this automatically. But we
first need to fix the reason that 63 nodes didn't come up properly in
the first place.
However, I'm wondering if you really need 400 nodes? If you need less,
and know which ones you want, create a topology file listing all the
nodes you need and then use this one for imaging. Information on that
should be on the wiki somewhere.
-max
On 1/2/07, Andrea G. Forte <andreaf at cs.columbia.edu> wrote:
> Dear all,
>
> I am try to imaging the nodes of the grid using the command:
> imageNodes4 [1..20,1..20] baseline.ndz
>
> after issuing this command the imaging starts. Everyday I get warnings
> about different nodes not working, but this is not the issue.
> Is it normal that after 45 minutes the imaging is still going on? Also,
> I got the error "FATAL service_call: Exception: ServiceException
> (ServiceException)
> ERROR run: ServiceException: ServiceException". Should I ignore it? If
> this is normal, one hour of the two hours that I can reserve each day
> goes away only for imaging, this is really inefficient.
>
> After pressing "Control C" to terminate the imaging, I got the following:
> ERROR ExecApp: Application 'commServer' failed (code=2)
> ERROR Communicator: ComServer failed: status: 2
> FATAL service_call: Exception: (Interrupt)
> /opt/nodehandler-4.1.2/app/nodeHandler.rb:94:in `service_call':
> ServiceException (ServiceException)
> from /opt/nodehandler-4.1.2/lib/handler/cmc.rb:187:in
> `nodeAllOffSoft'
> from /opt/nodehandler-4.1.2/app/nodeHandler.rb:419:in `shutdown'
> from /opt/nodehandler-4.1.2/app/nodeHandler.rb:651
>
> Unfortunately I do not have the exact experiment ID because the console
> froze after interrupting the imaging process. However, it is January 1st
> and I started the experiment at around 4:01 PM.
>
> Your help is very much appreciated as always.
>
> -Andrea
>
>
>
More information about the orbit-user
mailing list