ORBIT-USER: About imageNodes for 100+ nodes
Nanyan Jiang
jnyan at winlab.rutgers.edu
Wed Sep 27 08:55:25 EDT 2006
Hi Sachin,
Thanks. I tried to use the short cut you mentioned. It
worked yesterday.
However, this morning when I wanted to imageNodes it seemed not working
for single node image or multiple node images. Here are experiment IDs.
What I used were
imageNodes [10,11] file.ndz
imageNodes [10..16,10..16] file.ndz
Experiment ID: grid_2006_09_27_08_42_43
Experiment ID: grid_2006_09_27_08_42_34
When I used
imageNodes [9,16] file.ndz
imageNodes [11..13,11..13] file.ndz
They are ok!
The experiments were successful again. Are there any particular nodes,
which should not be included in the image node group (I guess node
[10,11] should be excluded)? Or are there any other issues to
be aware when imaging the nodes? Thanks.
Best,
Nanyan
On Tue, 26 Sep 2006, Sachin Ganu wrote:
> Hi Nanyan,
>
> For 1) there is a short cut tht can be used (Note that it may be
> cumbersome if your nodeset is not contiguous) e.g. tp image nodes 1,1
> to 1,10, you can use the shortcut
>
> imageNodes 1,1..10 baseline.ndz (this means row 1 and columns 1 to 10)
>
> For 2) We are looking into that..
>
> On 9/26/06, Nanyan Jiang <jnyan at winlab.rutgers.edu> wrote:
>> Hi all,
>>
>> I want to have my applications running on over 100 nodes of ORBIT.
>> However, when I imageNodes on ORBIT, using
>> imageNodes all file.ndz
>>
>> It will only image 60 nodes at a time. And if one of the nodes cannot
>> be checked properly (in my case, 1 out of 60 node(s) still down n_4_1),
>> the image process seems not starting (unless I wait not long enough).
>> (Experiment ID: grid_2006_09_26_16_20_44). I have two questions here:
>>
>> (1) Is there a convinient way to image large number of nodes on ORBIT
>> using the command imageNodes?
>> I am thinking using
>> imageNodes xyz file.ndz
>> where xyz is a text file containing all nodes I want to have the
>> same
>> images for each node (this is not supported by imageNodes). It is really
>> hard to input over 100 nodes' name in
>> the command
>> line, when "imageNodes all file.ndz" only image the same 60 nodes at a
>> time. I may miss other options using imageNodes -- please let me know.
>> Thanks.
>>
>> (2) Once there is misbehaved node during images, is there a time-out
>> mechanism for that node, such that the image process can continue without
>> that node (the missed node will be notified at then end of the process)?
>> Thanks.
>>
>> Best,
>>
>> Nanyan
>>
>>
>
More information about the orbit-user
mailing list