ORBIT-USER: About imageNodes for 100+ nodes
Sachin Ganu
sachin at winlab.rutgers.edu
Tue Sep 26 18:07:06 EDT 2006
Hi Nanyan,
For 1) there is a short cut tht can be used (Note that it may be
cumbersome if your nodeset is not contiguous) e.g. tp image nodes 1,1
to 1,10, you can use the shortcut
imageNodes 1,1..10 baseline.ndz (this means row 1 and columns 1 to 10)
For 2) We are looking into that..
On 9/26/06, Nanyan Jiang <jnyan at winlab.rutgers.edu> wrote:
> Hi all,
>
> I want to have my applications running on over 100 nodes of ORBIT.
> However, when I imageNodes on ORBIT, using
> imageNodes all file.ndz
>
> It will only image 60 nodes at a time. And if one of the nodes cannot
> be checked properly (in my case, 1 out of 60 node(s) still down n_4_1),
> the image process seems not starting (unless I wait not long enough).
> (Experiment ID: grid_2006_09_26_16_20_44). I have two questions here:
>
> (1) Is there a convinient way to image large number of nodes on ORBIT
> using the command imageNodes?
> I am thinking using
> imageNodes xyz file.ndz
> where xyz is a text file containing all nodes I want to have the same
> images for each node (this is not supported by imageNodes). It is really
> hard to input over 100 nodes' name in
> the command
> line, when "imageNodes all file.ndz" only image the same 60 nodes at a
> time. I may miss other options using imageNodes -- please let me know.
> Thanks.
>
> (2) Once there is misbehaved node during images, is there a time-out
> mechanism for that node, such that the image process can continue without
> that node (the missed node will be notified at then end of the process)?
> Thanks.
>
> Best,
>
> Nanyan
>
>
More information about the orbit-user
mailing list