ORBIT-USER: most of the grid does not work!
Andrea G Forte
andreaf at cs.columbia.edu
Fri Feb 16 16:31:19 EST 2007
Thank you Chris. I will definitely give it a try.
-Andrea
chris at orderonenetworks.com wrote:
> I was a bit quick to post that code :) This version here seems better :)
>
> Don't forget the 'chnod a+x gne'
>
> gne
> ----------------------
> #!/usr/bin/ruby
> # Author: Chris Davies (chris at orderonenetworks.com)
>
> # determine the output type
> if (ARGV[0] == nil || (ARGV[0] != "nh" && ARGV[0] != "oh")) then
> puts " Good Node Extractor v1.0"
> puts " usage: gne nh/oh"
> puts " "
> puts " This utility extracts the nodes that imaged successfully"
> puts " from the last experiment that was run. The parameter"
> puts " specifies the output type"
> puts " "
> puts " nh - node handler format for use with imageNodes4"
> puts " oh - orbit handler format"
> puts " "
> exit
> end
>
> # get the most recent log file
> command = "ls -c -1 /tmp/comm*.log"
>
> firstfile = IO.popen(command,"r").readlines[0]
>
> if (firstfile == nil) then
> puts "Error: log file not found."
> exit
> end
>
> # grep the file for nodes that imaged
> command = "grep 'INFO.*Wrote' " + firstfile
> if (ARGV[0] == "nh") then
> puts "defTopology('my:topo') { |t|"
> end
>
> IO.popen(command,"r").each { |line|
> if (line != nil) then
> line = line.split('n_')[1]
> line = line.split('>')[0]
>
> first = line.split('_')[0];
> second = line.split('_')[1];
> if (ARGV[0] == "nh")
> puts " t.addNode(" + first + "," + second + ");"
> else
> puts "["+first+","+second+"]"
> end
> end
> }
>
> if (ARGV[0] == "nh")
> puts "}"
> end
>
>
>
>
>
>
>> Ivan and Andrea,
>>
>> I'd prefer to have access to all the nodes - since the quantity counts :)
>>
>> I've come up with a little ruby script that may help. It basically parses
>> the output from the last experiment and generates a list of nodes that
>> imaged sucessfully. The list may be formatted in either 'nodehandler'
>> format (for use with imageNodes4) or 'orbithandler' format.
>>
>> This basically allows you to image all the nodes, then stop the process
>> when it images enough of them. Run the script to make the list of nodes
>> that passed and then use that for the rest of your experiment.
>>
>> The code is new, so any bugs, please let me know.
>>
>> Thanks,
>> Chris
>>
>>
>> sample output:
>> ------------------------------
>>
>>> gne nh
>>>
>> defTopology('my:topo') { |t|
>> t.addNode(1,10);
>> t.addNode(8,4);
>> t.addNode(9,1);
>> }
>>
>>
>> script (save as gne) (make sure to type 'chmod a+x gne' so it can run):
>> -----------------------------------------
>> #!/usr/bin/ruby
>> # Author: Chris Davies (chris at orderonenetworks.com)
>>
>> # determine the output type
>> if (ARGV[0] == nil || (ARGV[0] != "nh" && ARGV[0] != "oh")) then
>> puts " Good Node Extractor v1.0"
>> puts " usage: gne nh/oh"
>> puts " "
>> puts " This utility extracts the nodes that imaged successfully from
>> the"
>> puts " last experiment that was run. The parameter specifies the
>> output type"
>> puts " nh - node handler format for use with imageNodes4"
>> puts " oh - orbit handler format"
>> puts " "
>> exit
>> end
>>
>> # get the most recent log file
>> command = "ls -c -1 -r /tmp/*.log"
>>
>> firstfile = IO.popen(command,"r").readlines[0]
>>
>> if (firstfile == nil) then
>> puts "Error: log file not found."
>> exit
>> end
>>
>> # grep the file for nodes that imaged
>> command = "grep Wrote " + firstfile
>>
>> if (ARGV[0] == "nh") then
>> puts "defTopology('my:topo') { |t|"
>> end
>>
>> IO.popen(command,"r").each { |line|
>> #puts line
>> line = line.split('msg: <n_')[1]
>> line = line.split('>')[0]
>>
>> first = line.split('_')[0];
>> second = line.split('_')[1];
>> if (ARGV[0] == "nh")
>> puts " t.addNode(" + first + "," + second + ");"
>> else
>> puts "["+first+","+second+"]"
>> end
>>
>> }
>>
>> if (ARGV[0] == "nh")
>> puts "}"
>> end
>>
>>
>>
>>
>>
>>
>>
>>> Hi Andrea,
>>>
>>> If everybody agrees, we can try that. The problem is that people who
>>> want to have as many nodes as possible even if they are not 100%
>>> guaranteed to come up lose big time (once we declare nodes as
>>> "administratively down" you can't access them at all). What do others
>>> think about it?
>>>
>>> Ivan.
>>>
>>> -----Original Message-----
>>> From: owner-orbit-user at winlab.rutgers.edu
>>> [mailto:owner-orbit-user at winlab.rutgers.edu] On Behalf Of Andrea G Forte
>>> Sent: Thursday, February 15, 2007 12:17 PM
>>> To: orbit-user at winlab.rutgers.edu
>>> Subject: Re: ORBIT-USER: most of the grid does not work!
>>>
>>> Ivan,
>>>
>>> thank you. This might be very helpful but perhaps I need to understand
>>> it better. What is the difference between a node being off and being
>>> unavailable? Currently the grid shows only 4 nodes as unavailable but
>>> from my experiments there is a very large number of nodes that does not
>>> turn on and others that turn on but do not complete the imaging process.
>>>
>>> In my opinion it would be very helpful to mark all of these nodes as
>>> unavailable and just turn them off. In this way we would be able to
>>> image the good nodes with minimum effort and without having to start the
>>> imaging process again and again because of nodes getting stuck.
>>> In other words, nodes that get stuck or do not turn on cause only
>>> problems and should be disconnected.
>>>
>>> -Andrea
>>>
>>>
>>> Ivan Seskar wrote:
>>>
>>>>
>>>>
>>>>> From: owner-orbit-user at winlab.rutgers.edu
>>>>>
>>>>>
>>>> [mailto:owner-orbit-user at winlab.rutgers.edu] On Behalf Of Mesut Ali
>>>> Ergin
>>>>
>>>>
>>>>> Sent: Wednesday, February 14, 2007 10:38 PM
>>>>> To: orbit-user at winlab.rutgers.edu
>>>>> Subject: Re: ORBIT-USER: most of the grid does not work!
>>>>>
>>>>>
>>>> ...
>>>>
>>>> Just to add to this discussion: we are having problems with node power
>>>> supplies (the ones with the red dots on the status page actually have
>>>> dead power supplies). Unfortunately, the first symptoms of failing PSs
>>>> are CM lockups and nodes stuck in on or off state; it looks like we
>>>>
>>> will
>>>
>>>> have to replace all of them which is not a trivial thing to do. We are
>>>> trying to find ways of using interim software solution that will
>>>> (hopefully) prolong the life of power supplies as well as enable us to
>>>> do incremental replacement (rather than force us to shut down the grid
>>>> and replace all power supplies at once).
>>>>
>>>> Ivan.
>>>>
>>>> PS: Even better page for big grid status is
>>>> http://www.orbit-lab.org/wiki/Status/Grid - it has a webcam feed as
>>>>
>>> well
>>>
>>>> :-). (status pages do not auto-refresh so you will have to do it
>>>> manually - after all they are not really finished yet as you will
>>>> discover if you try to select individual nodes).
>>>>
>>>>
>>>
>>>
>>
>>
>
>
More information about the orbit-user
mailing list