ORBIT-USER: most of the grid does not work!

chris at orderonenetworks.com chris at orderonenetworks.com
Fri Feb 16 14:44:53 EST 2007


I was a bit quick to post that code :) This version here seems better :)

Don't forget the 'chnod a+x gne'

gne
----------------------
#!/usr/bin/ruby
# Author: Chris Davies (chris at orderonenetworks.com)

# determine the output type
if (ARGV[0] == nil || (ARGV[0] != "nh" &&  ARGV[0] != "oh")) then
    puts " Good Node Extractor v1.0"
    puts "  usage: gne nh/oh"
    puts " "
    puts "   This utility extracts the nodes that imaged successfully"
    puts "   from the last experiment that was run. The parameter"
    puts "   specifies the output type"
    puts " "
    puts "      nh - node handler format for use with imageNodes4"
    puts "      oh - orbit handler format"
    puts " "
    exit
end

# get the most recent log file
command = "ls -c -1  /tmp/comm*.log"

firstfile = IO.popen(command,"r").readlines[0]

if (firstfile == nil) then
    puts "Error: log file not found."
    exit
end

# grep the file for nodes that imaged
command = "grep 'INFO.*Wrote' " + firstfile
if (ARGV[0] == "nh") then
    puts "defTopology('my:topo') { |t|"
end

IO.popen(command,"r").each { |line|
    if (line != nil) then
        line = line.split('n_')[1]
        line = line.split('>')[0]

        first = line.split('_')[0];
        second = line.split('_')[1];
        if (ARGV[0] == "nh")
            puts "    t.addNode(" + first + "," + second + ");"
        else
            puts "["+first+","+second+"]"
        end
    end
}

if (ARGV[0] == "nh")
    puts "}"
end





> Ivan and Andrea,
>
> I'd prefer to have access to all the nodes - since the quantity counts :)
>
> I've come up with a little ruby script that may help. It basically parses
> the output from the last experiment and generates a list of nodes that
> imaged sucessfully. The list may be formatted in either 'nodehandler'
> format (for use with imageNodes4) or 'orbithandler' format.
>
> This basically allows you to image all the nodes, then stop the process
> when it images enough of them. Run the script to make the list of nodes
> that passed and then use that for the rest of your experiment.
>
> The code is new, so any bugs, please let me know.
>
> Thanks,
> Chris
>
>
> sample output:
> ------------------------------
>> gne nh
>
> defTopology('my:topo') { |t|
>     t.addNode(1,10);
>     t.addNode(8,4);
>     t.addNode(9,1);
> }
>
>
> script (save as gne) (make sure to type 'chmod a+x gne' so it can run):
> -----------------------------------------
> #!/usr/bin/ruby
> # Author: Chris Davies (chris at orderonenetworks.com)
>
> # determine the output type
> if (ARGV[0] == nil || (ARGV[0] != "nh" &&  ARGV[0] != "oh")) then
>     puts " Good Node Extractor v1.0"
>     puts "  usage: gne nh/oh"
>     puts " "
>     puts "   This utility extracts the nodes that imaged successfully from
> the"
>     puts "   last experiment that was run. The parameter specifies the
> output type"
>     puts "      nh - node handler format for use with imageNodes4"
>     puts "      oh - orbit handler format"
>     puts " "
>     exit
> end
>
> # get the most recent log file
> command = "ls -c -1 -r /tmp/*.log"
>
> firstfile = IO.popen(command,"r").readlines[0]
>
> if (firstfile == nil) then
>     puts "Error: log file not found."
>     exit
> end
>
> # grep the file for nodes that imaged
> command = "grep Wrote " + firstfile
>
> if (ARGV[0] == "nh") then
>     puts "defTopology('my:topo') { |t|"
> end
>
> IO.popen(command,"r").each { |line|
> #puts line
>     line = line.split('msg: <n_')[1]
>     line = line.split('>')[0]
>
>     first = line.split('_')[0];
>     second = line.split('_')[1];
>     if (ARGV[0] == "nh")
>        puts "    t.addNode(" + first + "," + second + ");"
>     else
>         puts "["+first+","+second+"]"
>     end
>
> }
>
> if (ARGV[0] == "nh")
>     puts "}"
> end
>
>
>
>
>
>
>> Hi Andrea,
>>
>> If everybody agrees, we can try that. The problem is that people who
>> want to have as many nodes as possible even if they are not 100%
>> guaranteed to come up lose big time (once we declare nodes as
>> "administratively down" you can't access them at all). What do others
>> think about it?
>>
>> Ivan.
>>
>> -----Original Message-----
>> From: owner-orbit-user at winlab.rutgers.edu
>> [mailto:owner-orbit-user at winlab.rutgers.edu] On Behalf Of Andrea G Forte
>> Sent: Thursday, February 15, 2007 12:17 PM
>> To: orbit-user at winlab.rutgers.edu
>> Subject: Re: ORBIT-USER: most of the grid does not work!
>>
>> Ivan,
>>
>> thank you. This might be very helpful but perhaps I need to understand
>> it better. What is the difference between a node being off and being
>> unavailable? Currently the grid shows only 4 nodes as unavailable but
>> from my experiments there is a very large number of nodes that does not
>> turn on and others that turn on but do not complete the imaging process.
>>
>> In my opinion it would be very helpful to mark all of these nodes as
>> unavailable and just turn them off. In this way we would be able to
>> image the good nodes with minimum effort and without having to start the
>> imaging process again and again because of nodes getting stuck.
>> In other words, nodes that get stuck or do not turn on cause only
>> problems and should be disconnected.
>>
>> -Andrea
>>
>>
>> Ivan Seskar wrote:
>>>
>>>
>>>
>>>> From: owner-orbit-user at winlab.rutgers.edu
>>>>
>>> [mailto:owner-orbit-user at winlab.rutgers.edu] On Behalf Of Mesut Ali
>>> Ergin
>>>
>>>> Sent: Wednesday, February 14, 2007 10:38 PM
>>>> To: orbit-user at winlab.rutgers.edu
>>>> Subject: Re: ORBIT-USER: most of the grid does not work!
>>>>
>>>
>>> ...
>>>
>>> Just to add to this discussion: we are having problems with node power
>>> supplies (the ones with the red dots on the status page actually have
>>> dead power supplies). Unfortunately, the first symptoms of failing PSs
>>> are CM lockups and nodes stuck in on or off state; it looks like we
>> will
>>> have to replace all of them which is not a trivial thing to do. We are
>>> trying to find ways of using interim software solution that will
>>> (hopefully) prolong the life of power supplies as well as enable us to
>>> do incremental replacement (rather than force us to shut down the grid
>>> and replace all power supplies at once).
>>>
>>> Ivan.
>>>
>>> PS: Even better page for big grid status is
>>> http://www.orbit-lab.org/wiki/Status/Grid - it has a webcam feed as
>> well
>>> :-). (status pages do not auto-refresh so you will have to do it
>>> manually - after all they are not really finished yet as you will
>>> discover if you try to select individual nodes).
>>>
>>
>>
>>
>
>
>





More information about the orbit-user mailing list