wiki:Old/orbitHandler

Version 18 (modified by cmdavies, 18 years ago) ( diff )

Documentation | !orbitHandler

Orbit Handler

orbitHandler is an alternative framework for controlling the ORBIT lab. It consists of a series of console commands that that operate on subsets of nodes. The major features:

  • Precisely specifiy large, intricate node sets
  • Send specific commands to each node in the set
  • Distributed ftp equivilant
  • Entirely command line driven
  • No global grid operations
  • Easy to create repeatable experiments

Node Sets

Node sets are contained in a file. This file name is the first parameter for every function. For example:

joeuser@console.sb1:~$ ./orbitImage nodeset baseline.ndz

An example node set file looks like this:

# A node set file
[1..16,1..15]
[18,18..19]
[19..20,1..2]
-[10,2]
-[2..3,5..3]

Each line includes a different range of nodes. The lines are processed in order and the results are merged. Any node range with a '-' in front of it removes that range from the current set. This can be used to remove non-functional nodes, or a particular class of nodes (ie. those nodes using intel wifi cards).

Node Commands Overview

The following are list of supported commands.

  • orbitImage - Images a set of nodes
  • orbitPower - Turns on or off a set of nodes
  • orbitPutFile - Copies a file from the console to a set of nodes
  • orbitGetFileMerged - Merges a file from each node into one file on the console
  • orbitCmd - Runs a user definable command that affects nodes in the set
  • orbitCmdSeq - The same as 'orbitCmd' except commands are run sequentially
  • orbitWait - Waits for a set of processes to finish
  • orbitRun - Executes a command on each node in the set
  • orbitRunWait - Executes a command on each nodes and blocks on completion
  • orbitKillAll - Kills all matching processes on each node in the set
  • orbitKillOne - Kills the first matching process on each node in the set

Repeatable experiments can be created by building a script file that contains a series of these commands. For Linux beginners, don't forget when creating the script file that you need to set the execute permissions on it:

joeuser@console.sb1:~$ chmod +x scriptFile

Sample Experiment Script

The following is an example of a script that can be used to run a repeatable experiment:

You can debug your experiment in the sandboxes by having a node set file that contains only the two nodes, and when you want to run it in the main grid, simply update the nodeset file.

# script to run the experiment

# image the nodes
./orbitImage nodeset baseline.ndz

# wait until all the experiment nodes have booted
./orbitPower nodeset on

# Configure my wireless interfaces
./orbitRunWait nodeset "iwconfig ath0 essid 'oontest'"
./orbitRunWait nodeset "iwconfig ath0 channel 1"
./orbitRunWait nodeset "iwconfig ath0 mode 'ad-hoc'"
./orbitRunWait nodeset "iwconfig ath0 rate '11M'"
./orbitRunWait nodeset "ifconfig ath0 192.168.%x.%y"

# Copy the latest version of my application to each node
./orbitPutFile nodeset myApp /root

# Run my app on each experiment node, but don’t block
./orbitRun nodeset "./myApp -i ath0"

# delay while it does its thing
sleep 60s

# Some random command to ping every node from the console
./orbitCmd nodeset "ping node%x-%y"

# Kill off all the instances of my app running on the nodes
./orbitKillAll nodeset "./myApp -i ath0"

# copy and merge the logs from each node into one file on the console
# this merged file can then be processed by your data analysis tools
./orbitGetFileMerged nodeset ./route_logs.csv
./orbitGetFileMerged nodeset ./error_rates.csv

# shutdown the nodes in the experiment
./orbitPower nodeset off

For more experiments I'd highly recommend removing the './orbitImage' and './orbitPower' commands to speed up the test/debug process considerably.

Node Commands in Detail

Command: orbitImage

Makes use of images created using 'saveNode' that is part of the original NodeHandler framework. Only one set of nodes may be imaged at the same time. The useage for the command is:

joeuser@console.sb1:~$ ./orbitImage nodeset baseline.ndz

Command: orbitPower

Allows the user to turn on and off a set of nodes on the grid without affecting those nodes not in the specified node set. This function will block and provide status updates until every node in the set has either turned on or off. The useage for the command is:

Turn on a set of nodes

joeuser@console.sb1:~$ ./orbitPower nodeset on

Turn off a set of nodes

joeuser@console.sb1:~$ ./orbitPower nodeset off

Command: orbitPutFile

Acts like a distributed ftp function. It allows a file to be copied from the console to every node in set. The useage for this command is:

Copy a file to the home directory of the node

joeuser@console.sb1:~$ ./orbitPutFile nodeset myFile

Copy a file to a particular directory of the node

joeuser@console.sb1:~$ ./orbitPutFile nodeset libApp.so /usr/lib

Command: orbitGetFileMerged

This command is primarily used to gather up log files and merge them into a single file on the console for processing. For example, if your application on each node generates a file call 'logfile.csv' then you would use the following command to agregate all those files into a single file on the console for further analysis.

joeuser@console.sb1:~$ ./orbitGetFileMerged nodeset logfile.csv mergedLogFile.csv

Command: orbitCmd and orbitCmdSeq

These commands are identical, except that 'orbitCmdSeq' executes the user command sequentially and orbitCmd executes the command simultaneously. Each of these commands will replace '%x' and '%y' with the x and y coordinates for all the nodes in the nodeset. The useage of this command looks like this:

Ping every node in the node set

joeuser@console.sb1:~$ ./orbitCmd nodeset "ping node%x-%y"

Command: orbitWait

Waits for a series of processes to finish that match the specified pattern. Taking from the previous example, we're going to wait until the ping commands have finished.

joeuser@console.sb1:~$ ./orbitWait nodeset "ping node%x-%y"

Command: orbitRun and orbitRunWait

These commands will execute a command directly on the node itself the only difference is that orbitRunWait will block until the command has completed.

Configure the ip address of a the node

joeuser@console.sb1:~$ ./orbitRunWait nodeset "ifconfig ath0 192.168.%x.%y"

Command: orbitKillAll and orbitKillOne

These commands will terminate a process(s) on all nodes in the node set. 'orbitKillAll' will remove all matching processes. 'orbitKillOne' will remove only the first matching process. These functions can be used to finish an experiment prior to using 'orbitGetFileMerged' to collect the results.

Stop the test application

joeuser@console.sb1:~$ ./orbitKillAll nodeset "./myApp"

Attachments (1)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.