wiki:Software/cOMF

Experiment Control (OMF)

The Experiment controller service directly orchestrates all experiments. It is controlled directly by the user via the omf command and sends messages to the requisite services to initiate an experiment and line up all the services required for proper booting of the nodes, running experiment scripts, and finally restoring the testbed to it's original state after the expirment is complete.


user@console:omf
Run a command on the testbed(s)
Usage: omf-5.4  [COMMAND] [ARGUMENT]...
  Available COMMANDs:
    help   Print this help message or a specify command usage
    exec   Execute an experiment script
    load   Load a disk image on a given set of nodes
    save   Save a disk image from a given node into a file
    tell   Switch a given set of nodes ON/OFF or reboot them
    stat   Returns the status of a given set of nodes
  To get more help on individual commands: 'omf-5.4 help [COMMAND]'
  Examples:
            omf-5.4  help exec   Return usage/help for the 'exec' command
            omf-5.4  help load   Return usage/help for the 'load' command

omf load

Load command is used to put an image onto the hard disk of the node.

Usage: omf help load

Install a given disk image on the nodes in a testbed
Usage:
      omf-5.4 load [-h] [-i IMAGE_PATH] [-o TIMEOUT] [-t TOPOLOGY] [-c AGGREGATE]
 
      With: 
      -h, --help                print this help message
      -c, --config AGGREGATE    use testbed AGGREGATE
      -t, --topology TOPOLOGY   a valid topology file or description (defaults to 'system:topo:all')
                                (if a file 'TOPOLOGY' doesn't exist, interpret it as a comma-separated list of nodes)
      -i, --image IMAGE         disk image to load
                                (default is 'baseline.ndz', the latest stable baseline image)
      -o, --timeout TIMEOUT     a duration (in sec.) after which imageNodes should stop waiting for
                                nodes that have not finished their image installation
                                (default is 800 sec, i.e. 13min 20sec)
          --outpath PATH        Path where the resulting Topologies should be saved
                                (default is '/tmp')
          --outprefix PREFIX    Prefix to use for naming the resulting Topologies
                                (default is your experiment ID)
 
      Some Examples: 
                    omf-5.4 load
                    omf-5.4 load -t system:topo:all -i baseline-2.4.ndz
                    omf-5.4 load -t omf.nicta.node1 -i wireless-2.6.ndz
                    omf-5.4 load -t omf.nicta.node1,omf.nicta.node2 -i baseline.ndz -o 400
                    omf-5.4 load -t system:topo:circle -i my_Own_Image.ndz
                    omf-5.4 load -t my_Own_Topology -i baseline-2.2.ndz -t 600 -c grid
                    omf-5.4 load -t my_Own_Topology --outpath ./ --outprefix my_Own_Prefix

Two important arguments are TOPOLOGY describing the set of nodes one wishes to image , and !IMAGE specifying the name of the image one wants to load the nodes with. If the imaging process does not does not finish within the default timeout period, that period can be increase by using the -o flag (e.g. -o 1600). Typical command to load both nodes of sandbox 1 with the baseline? image would look like:

Example: omf load-i baseline.ndz -t system:topo:all

username@console.sb7:~$ omf load -t all -i baseline.ndz

 INFO NodeHandler: OMF Experiment Controller 5.4 (git c005675)
 INFO NodeHandler: Slice ID: pxe_slice 
 INFO NodeHandler: Experiment ID: pxe_slice-2013-01-16t14.56.02-05.00
 INFO NodeHandler: Message authentication is disabled
 INFO Experiment: load system:exp:stdlib
 INFO property.resetDelay: resetDelay = 230 (Fixnum)
 INFO property.resetTries: resetTries = 1 (Fixnum)
 INFO Experiment: load system:exp:eventlib
 INFO Experiment: load system:exp:imageNode
 INFO property.nodes: nodes = "system:topo:all" (String)
 INFO property.image: image = "baseline.ndz" (String)
 INFO property.domain: domain = "sb7.orbit-lab.org" (String)
 INFO property.outpath: outpath = "/tmp" (String)
 INFO property.outprefix: outprefix = "pxe_slice-2013-01-16t14.56.02-05.00" (String)
 INFO property.timeout: timeout = 800 (Fixnum)                                                                                          
 INFO property.resize: resize = nil (NilClass)
 INFO Topology: Loading topology 'system:topo:all'.
 INFO Experiment: Resetting resources
 INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: node1-2.sb7.orbit-lab.org,node1-1.sb7.orbit-lab.org) [0 sec.]
 INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: node1-2.sb7.orbit-lab.org,node1-1.sb7.orbit-lab.org) [10 sec.]
 INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: node1-2.sb7.orbit-lab.org,node1-1.sb7.orbit-lab.org) [20 sec.]
 INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: node1-2.sb7.orbit-lab.org,node1-1.sb7.orbit-lab.org) [30 sec.]
 INFO ALL_UP: Event triggered. Starting the associated tasks. 
 INFO exp: Progress(0/0/2): 0/0/0 min(node1-2.sb7.orbit-lab.org)/avg/max (30) - Timeout: 760 sec.
 INFO exp: Progress(0/0/2): 10/10/10 min(node1-2.sb7.orbit-lab.org)/avg/max (30) - Timeout: 750 sec.
 INFO exp: Progress(0/0/2): 10/15/20 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 740 sec.
 INFO exp: Progress(0/0/2): 20/25/30 min(node1-2.sb7.orbit-lab.org)/avg/max (30) - Timeout: 730 sec.
 INFO exp: Progress(0/0/2): 30/35/40 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 720 sec.
 INFO exp: Progress(0/0/2): 40/40/40 min(node1-2.sb7.orbit-lab.org)/avg/max (30) - Timeout: 710 sec.
 INFO exp: Progress(0/0/2): 40/45/50 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 700 sec.
 INFO exp: Progress(0/0/2): 50/55/60 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 690 sec.
 INFO exp: Progress(0/0/2): 60/65/70 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 680 sec.
 INFO exp: Progress(0/0/2): 60/65/70 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 670 sec.
 INFO exp: Progress(0/0/2): 70/75/80 min(node1-2.sb7.orbit-lab.org)/avg/max (30) - Timeout: 660 sec.
 INFO exp: Progress(0/0/2): 90/90/90 min(node1-2.sb7.orbit-lab.org)/avg/max (30) - Timeout: 650 sec.
 INFO exp: Progress(1/0/2): 90/95/100 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 640 sec.
 INFO exp: Progress(2/0/2): 100/100/100 min()/avg/max (30) - Timeout: 630 sec.
 INFO exp:  ----------------------------- 
 INFO exp:  Imaging Process Done 
 INFO exp:  2 nodes successfully imaged - Topology saved in '/tmp/pxe_slice-2013-01-16t14.56.02-05.00-topo-success.rb'
 INFO exp:  ----------------------------- 
 INFO EXPERIMENT_DONE: Event triggered. Starting the associated tasks.
 INFO NodeHandler: 
 INFO NodeHandler: Shutting down experiment, please wait...
 INFO NodeHandler: 
 INFO NodeHandler: Shutdown flag is set - Turning Off the resources
 INFO run: Experiment pxe_slice-2013-01-16t14.56.02-05.00 finished after 3:13

omf save

Usage: omf help save

omf-5.4 help save
Save a disk image from a given node into an archive file
Usage:
      omf-5.4 save -n NODE [-h] [-c AGGREGATE]
 
      With: 
      -h, --help       print this help message
      -n, --node NODE  a valid description of a single node
                       (no default here, you have to enter a node!)
 
      Some Examples: 
                    omf-5.4 save -n omf.nicta.node1
                    omf-5.4 save -n omf.nicta.node3 -c grid

omf tell

OMF command to control the power state/reset the nodes.

Usage: omf help tell

user@console:omf help tell
Switch ON/OFF and reboot the nodes in a testbed
Usage:
      omf tell [-h] -t TOPOLOGY -a ACTION [-c AGGREGATE]
 
      With: 
      -h, --help           print this help message
 
      -a, --action ACTION  specify an action
      ACTION:
      on              turn node(s) ON
      offs            turn node(s) OFF (soft)
      offh            turn node(s) OFF (hard)
      reboot          reboots node(s) (soft)
      reset           resets node(s) (hard)
 
      -h, --help                print this help message
      -t, --topology TOPOLOGY   a valid topology file or description (MANDATORY)
      -c, --config AGGREGATE    use testbed AGGREGATE
 
      Some Examples: 
                    omf tell -a reset -t node1-1.grid.orbit-lab.org
                    omf tell -a on -t system:topo:all -c grid
                    omf tell -a reboot -t node1-1
                    omf tell -a offh -t [1..2,1..5]
                    omf tell -a offh -t system:topo:all
                    omf tell -a reset -t system:topo:imaged

The commands are: on, offh (equivalent to pulling out the power cord), offs (software shutdown), reboot (software reboot) and reset (hardware reset).

Example: turn off node1-1 on the outdoor domain

user@console.outdoor:~# omf tell -a offh -t node1-1

 INFO NodeHandler: OMF Experiment Controller 5.4 (git 3fb37b9)
 INFO NodeHandler: Reading configuration file /etc/omf-expctl-5.4/services.yaml
 INFO NodeHandler: Add domain http - http://internal1.orbit-lab.org:5054/
 INFO NodeHandler: Add domain http - http://repository1.orbit-lab.org:5054/
 INFO NodeHandler: Slice ID: default_slice (default)
 INFO NodeHandler: Experiment ID: default_slice-2014-09-30t00.24.28.504-04.00
 INFO NodeHandler: Message authentication is disabled
 INFO Experiment: load system:exp:stdlib
 INFO property.resetDelay: resetDelay = 230 (Fixnum)
 INFO property.resetTries: resetTries = 1 (Fixnum)
 INFO Experiment: load system:exp:eventlib
 INFO Experiment: load system:exp:winlib
 INFO Experiment: load system:exp:tell
 INFO property.nodes: nodes = "node1-1" (String)
 INFO property.command: command = "offh" (String)

Talking to the CMC service, please wait
-----------------------------------------------
 Node: node1-1.outdoor.orbit-lab.org   	 Reply: OK
-----------------------------------------------

 INFO EXPERIMENT_DONE: Event triggered. Starting the associated tasks.
 INFO NodeHandler: 
 INFO NodeHandler: Shutting down experiment, please wait...
 INFO NodeHandler: 
 INFO run: Experiment default_slice-2014-09-30t00.24.28.504-04.00 finished after 0:10

omf stat

This omf command is used to display the power status of the node/domain.

Usage: omf help stat

username@consoles.outdoor:omf-5.4 help stat
Returns the status of the nodes in a testbed
Usage:
      omf-5.4 stat [-h] [-s] [-t TOPOLOGY] [-c AGGREGATE]
 
      With: 
      -h, --help                print this help message
      -s, --summary             print a summary of the node status for the testbed
      -c, --config AGGREGATE    use testbed AGGREGATE
      -t, --topology TOPOLOGY   a valid topology file or description (defaults to 'system:topo:all')
 
      Some Examples: 
                    omf-5.4 stat
                    omf-5.4 stat -s
                    omf-5.4 stat -t omf.nicta.node1,omf.nicta.node2 -c sb1
                    omf-5.4 stat -t system:topo:all -c grid

Individual nodes are identified in the output of stat command by their fully qualified domain name (FQDN). This establishes their "coordinates" and the "domain" to which they belong. Nodes in different domains typically can NOT see each other. Node can be in 1 of 3 states:

POWEROFF Node is Available for use but turned off
POWERON Node is Available and is on
NOT REGISTERED Node is not Available for use

Example: omf stat on the outdoor domain

user@console.outdoor:~# omf stat

 INFO NodeHandler: OMF Experiment Controller 5.4 (git 6d34264)
 INFO NodeHandler: Slice ID: default_slice (default)
 INFO NodeHandler: Experiment ID: default_slice-2012-10-14t14.42.15-04.00
 INFO NodeHandler: Message authentication is disabled
 INFO Experiment: load system:exp:stdlib
 INFO property.resetDelay: value = 210 (Fixnum)
 INFO property.resetTries: value = 1 (Fixnum)
 INFO Experiment: load system:exp:eventlib
 INFO Experiment: load system:exp:stat
 INFO Topology: Loading topology ''.
 INFO property.nodes: value = "system:topo:all" (String)
 INFO property.summary: value = false (FalseClass)
 INFO Topology: Loading topology 'system:topo:all'.
 Talking to the CMC service, please wait
-----------------------------------------------
 Domain: outdoor.orbit-lab.org
 Node: node3-6.outdoor.orbit-lab.org   	 State: NOT REGISTERED
 Node: node3-3.outdoor.orbit-lab.org   	 State: POWEROFF
 Node: node2-10.outdoor.orbit-lab.org    State: POWEROFF
 Node: node1-10.outdoor.orbit-lab.org    State: POWEROFF
 Node: node1-8.outdoor.orbit-lab.org   	 State: POWERON
 Node: node1-6.outdoor.orbit-lab.org   	 State: POWERON
 Node: node3-2.outdoor.orbit-lab.org   	 State: POWEROFF
 Node: node3-1.outdoor.orbit-lab.org   	 State: POWEROFF
 Node: node1-3.outdoor.orbit-lab.org   	 State: POWERON
 Node: node3-5.outdoor.orbit-lab.org   	 State: POWEROFF
 Node: node2-5.outdoor.orbit-lab.org   	 State: NOT REGISTERED
 Node: node1-2.outdoor.orbit-lab.org   	 State: POWERON
-----------------------------------------------
 INFO Experiment: Switching ON resources which are OFF
 INFO EXPERIMENT_DONE: Event triggered. Starting the associated tasks.
 INFO NodeHandler: 
 INFO NodeHandler: Shutting down experiment, please wait...
 INFO NodeHandler: 
 INFO run: Experiment default_slice-2012-10-14t14.42.15-04.00 finished after 0:6

Last modified 14 months ago Last modified on 06/10/16 04:08:55