ORBIT-USER: Improved imaging procedures
Sanjit Krishnan Kaul
sanjitkaul at gmail.com
Sun Oct 8 02:23:07 EDT 2006
JFYI.
Tried it, for a much larger image than baseline, guess why it takes more
time.
INFO exp: Progress(383/0/383): 100/100/100 min()/avg/max (148.206492)
INFO Experiment: DONE!
INFO ExecApp: Application 'commServer' finished
INFO run: Experiment grid_2006_10_08_00_37_33 finished after 25:35
Nodes listed below timed out:
node1-16
node16-17
node1-14
node11-1
node2-2
node10-14
node12-5
node19-19
node14-12
node18-19
node10-11
node4-1
node17-4
node19-6
node18-17
- Sanjit
-----Original Message-----
From: owner-orbit-user at winlab.rutgers.edu
[mailto:owner-orbit-user at winlab.rutgers.edu] On Behalf Of Max Ott
Sent: Saturday, October 07, 2006 9:34 PM
To: orbit-user at winlab.rutgers.edu
Subject: ORBIT-USER: Improved imaging procedures
Folks,
I installed a new version of the nodehandler with which I now
regularly image the entire grid (or better the approx. 385 nodes which
boot up properly) in about 16 minutes with almost half of the time
spent coaxing the nodes to boot up.
As I haven't tested much more than just imaging and also haven't
installed the new agent on anything but the PXE image, the old
nodehandler is still there and the new one is called 'nodehandler4'.
Similarly, the new imaging command is called 'imageNodes4'
$ imageNodes4 -h
Usage: /usr/bin/imageNodes4 [topology] [image_file_path]
Example:
/usr/bin/imageNodes4 system:topo:all tmp/image-tridencom-test.ndz
It takes two optional arguments, a topology and the image name.
Right now, 'topology' is just a fancy way for defining the list of
nodes to use, but will in the future include ways to define the
additional noise injection settings to get us closer to a real
topology.
There is a new command 'defTopology' (http://tinyurl.com/pcqo8) which
is almost identical to the way 'defNodes' defines the list of nodes. A
topology can be saved in its own file and referenced from a defNodes
command.
For instance, the repository contains a topology containing all nodes
in the grid. It's URI is 'system:topo:all' and is simply defined as:
defTopology('system:topo:all', [1 .. OConfig['X_MAX'], 1 ..
OConfig['Y_MAX']])
Now let's assume you want to use two opposing corner nodes and we hard
code the grid size to 20x20 (not that I would ever condone hard
coding). Create a file 'my_topo_diagonal.rb' and add the following
line:
defTopology('my:topo:diagonal', [[1,1],[20,20]])
Now, you can image those nodes with your image through:
$ imageNodes4 my:topo:diagonal my_image.ndz
The 'defNodes' command in the new nodehandler will accept a topology
URI as the second argument (selector), but we'll announce that when we
fully switch over.
Also, please don't use a block with the 'defTopology' command. I
realized last night that the installed version doesn't properly
support that - and I don't want to install the fix without testing it
- not sure when that will happen. In other words, the fancy 'circle'
example in the wiki page doesn't work, yet. The normal array
definitions do work, though.
There is still the odd node which sometimes goes deaf during imaging
and the current imaging script doesn't detect that and as a result
won't finish, but it happens much less frequently than before and I
only saw it for > 350 nodes. Hopefully, your experience is similar.
Finally, please note that the default topology behavior is
'non-strict', meaning the nodehandler will remove a node from a
topology or node set if the node doesn't check in after one additional
reboot. The nodehandler will print a warning message but will then
continue.
So please check if the nodes skipped are important to you.
WARN Giving up on node n_19_6
Let me know if you have problems and I'll look into it. Don't forget
to add the experiment ID to any bug report.
Thanks,
-max
More information about the orbit-user
mailing list