wiki:Documentation/CGettingStarted

Version 87 (modified by seskar, 10 years ago) ( diff )

How to get started

First, you will need an ORBIT account. Please check the usage policy if you are eligible. Typically, in order to get an account,one would have to register for an account and get it approved by the PI in charge of the project/institution they wish to be part of. If particular institution is not available, the appropriate PI can register for an institutional account.

Six Steps

A typical experiment requires the following six steps:

  1. Create reservation:
    Before you can access the testbed,you need to make a reservation and get it approved by the reservation service. First time users are highly encouraged to reserve time on a sandbox instead of the main grid, and start with this built-in "Hello World" experiment. For the rest of this tutorial we will assume that user reserved sb1 domain.
  2. Login into reserved domain: "ssh username@sb1.orbit-lab.org"

    Login into reserved domain

    During your approved time slot, you will be able to ssh into the console of the respective domain. A console is a dedicated machine that allows access to all resources in that domain.

    For example, to access the sandbox1:

    yourhost>ssh username@console.sb1.orbit-lab.org
    
    Using username "username".
    Authenticating with public key "xxxxxxxxx"
    Welcome to Ubuntu 12.04.1 LTS (GNU/Linux 3.2.0-36-generic x86_64)
    
     * Documentation:  https://help.ubuntu.com/
    
      System information as of Mon Jan 28 20:25:50 EST 2013
    
      System load:  0.0               Processes:           93
      Usage of /:   2.7% of 69.43GB   Users logged in:     0
      Memory usage: 6%                IP address for eth0: 10.50.18.10
      Swap usage:   0%                IP address for eth1: 10.18.0.10
    
      Graph this data and manage this system at https://landscape.canonical.com/
    
    9 packages can be updated.
    0 updates are security updates.
    
    |-----------------------------------------------------------------|
    |                 *** For authorized use only ***                 |
    | This system is for the use of authorized users only.  All users |
    | are expected to comply with the "Acceptable Use Policy" availa- |
    | ble at http://www.orbit-lab.org/AUP.html                        |
    | Individuals using this computer system, are subject to having   |
    | all of their activities on this system monitored and recorded   |
    | by system personnel.                                            |
    |                                                                 |
    | Anyone using this system expressly consents to such monitoring  |
    | and is advised that if such monitoring reveals possible         |
    | evidence of criminal activity, system personnel may provide the |
    | evidence of such monitoring to law enforcement officials.       |
    |                                                                 |
    | Email question, comments or problems to help@orbit-lab.org      |
    |-----------------------------------------------------------------|
    
    username@console.sb1:~$ 
    

After you receive the confirmation email, you can access the reserved domain by ssh to the corresponding domain console.

  1. Load an image on the nodes: "omf load -i baseline.ndz -t all"

    Load an Image

    1. Before we begin using the nodes, it's a good idea to check their status first. This is done with the omf stat command.

      omf stat

      This omf command is used to display the power status of the node/domain.

      Usage: omf stat

      username@consoles.outdoor:omf stat
      Returns the status of the nodes in a testbed
      Usage:
            omf-5.4 stat [-h] [-s] [-t TOPOLOGY] [-c AGGREGATE]
       
            With: 
            -h, --help                print this help message
            -s, --summary             print a summary of the node status for the testbed
            -c, --config AGGREGATE    use testbed AGGREGATE
            -t, --topology TOPOLOGY   a valid topology file or description (defaults to 'system:topo:all')
       
            Some Examples: 
                          omf-5.4 stat
                          omf-5.4 stat -s
                          omf-5.4 stat -t omf.nicta.node1,omf.nicta.node2 -c sb1
                          omf-5.4 stat -t system:topo:all -c grid
      

      Individual nodes are identified in the output of stat command by their fully qualified domain name (FQDN). This establishes their "coordinates" and the "domain" to which they belong. Nodes in different domains typically can NOT see each other. Node can be in 1 of 3 states:

      POWEROFF Node is Available for use but turned off
      POWERON Node is Available and is on
      NOT REGISTERED Node is not Available for use

      Example: omf stat on the outdoor domain

      user@console.outdoor:~# omf stat -t all
      
       INFO NodeHandler: OMF Experiment Controller 5.4 (git 6d34264)
       INFO NodeHandler: Slice ID: default_slice (default)
       INFO NodeHandler: Experiment ID: default_slice-2012-10-14t14.42.15-04.00
       INFO NodeHandler: Message authentication is disabled
       INFO Experiment: load system:exp:stdlib
       INFO property.resetDelay: value = 210 (Fixnum)
       INFO property.resetTries: value = 1 (Fixnum)
       INFO Experiment: load system:exp:eventlib
       INFO Experiment: load system:exp:stat
       INFO Topology: Loading topology ''.
       INFO property.nodes: value = "system:topo:all" (String)
       INFO property.summary: value = false (FalseClass)
       INFO Topology: Loading topology 'system:topo:all'.
       Talking to the CMC service, please wait
      -----------------------------------------------
       Domain: outdoor.orbit-lab.org
       Node: node3-6.outdoor.orbit-lab.org   	 State: NOT REGISTERED
       Node: node3-3.outdoor.orbit-lab.org   	 State: POWEROFF
       Node: node2-10.outdoor.orbit-lab.org    State: POWEROFF
       Node: node1-10.outdoor.orbit-lab.org    State: POWEROFF
       Node: node1-8.outdoor.orbit-lab.org   	 State: POWERON
       Node: node1-6.outdoor.orbit-lab.org   	 State: POWERON
       Node: node3-2.outdoor.orbit-lab.org   	 State: POWEROFF
       Node: node3-1.outdoor.orbit-lab.org   	 State: POWEROFF
       Node: node1-3.outdoor.orbit-lab.org   	 State: POWERON
       Node: node3-5.outdoor.orbit-lab.org   	 State: POWEROFF
       Node: node2-5.outdoor.orbit-lab.org   	 State: NOT REGISTERED
       Node: node1-2.outdoor.orbit-lab.org   	 State: POWERON
      -----------------------------------------------
       INFO Experiment: Switching ON resources which are OFF
       INFO EXPERIMENT_DONE: Event triggered. Starting the associated tasks.
       INFO NodeHandler: 
       INFO NodeHandler: Shutting down experiment, please wait...
       INFO NodeHandler: 
       INFO run: Experiment default_slice-2012-10-14t14.42.15-04.00 finished after 0:6
      

    2. It is recommended that the node be in the POWEROFF state prior to any experiment process. If the node is in the POWERON state you can use the omf tell command to get the node into the off state.

      omf tell

      OMF command to control the power state/reset the nodes.

      Usage: omf tell

      user@console:omf tell
      Switch ON/OFF and reboot the nodes in a testbed
      Usage:
            omf tell [-h] -t TOPOLOGY -a ACTION [-c AGGREGATE]
       
            With: 
            -h, --help           print this help message
       
            -a, --action ACTION  specify an action
            ACTION:
            on              turn node(s) ON
            offs            turn node(s) OFF (soft)
            offh            turn node(s) OFF (hard)
            reboot          reboots node(s) (soft)
            reset           resets node(s) (hard)
       
            -h, --help                print this help message
            -t, --topology TOPOLOGY   a valid topology file or description (MANDATORY)
            -c, --config AGGREGATE    use testbed AGGREGATE
       
            Some Examples: 
                          omf tell -a reset -t node1-1.grid.orbit-lab.org
                          omf tell -a on -t system:topo:all -c grid
                          omf tell -a reboot -t node1-1
                          omf tell -a offh -t [1..2,1..5]
                          omf tell -a offh -t system:topo:all
                          omf tell -a reset -t system:topo:imaged
      

      The commands are: on, offh (equivalent to pulling out the power cord), offs (software shutdown), reboot (software reboot) and reset (hardware reset).

      Example: turn off node1-1 on the outdoor domain

      user@console.outdoor:~# omf tell -a offh -t node1-1
      
       INFO NodeHandler: OMF Experiment Controller 5.4 (git 3fb37b9)
       INFO NodeHandler: Reading configuration file /etc/omf-expctl-5.4/services.yaml
       INFO NodeHandler: Add domain http - http://internal1.orbit-lab.org:5054/
       INFO NodeHandler: Add domain http - http://repository1.orbit-lab.org:5054/
       INFO NodeHandler: Slice ID: default_slice (default)
       INFO NodeHandler: Experiment ID: default_slice-2014-09-30t00.24.28.504-04.00
       INFO NodeHandler: Message authentication is disabled
       INFO Experiment: load system:exp:stdlib
       INFO property.resetDelay: resetDelay = 230 (Fixnum)
       INFO property.resetTries: resetTries = 1 (Fixnum)
       INFO Experiment: load system:exp:eventlib
       INFO Experiment: load system:exp:winlib
       INFO Experiment: load system:exp:tell
       INFO property.nodes: nodes = "node1-1" (String)
       INFO property.command: command = "offh" (String)
      
      Talking to the CMC service, please wait
      -----------------------------------------------
       Node: node1-1.outdoor.orbit-lab.org   	 Reply: OK
      -----------------------------------------------
      
       INFO EXPERIMENT_DONE: Event triggered. Starting the associated tasks.
       INFO NodeHandler: 
       INFO NodeHandler: Shutting down experiment, please wait...
       INFO NodeHandler: 
       INFO run: Experiment default_slice-2014-09-30t00.24.28.504-04.00 finished after 0:10
      

    3. Once node set is on an POWEROFF state, load an image with omf load command

      omf load

      Load command is used to put an image onto the hard disk of the node.

      Usage: omf load

      Install a given disk image on the nodes in a testbed
      Usage:
            omf-5.4 load [-h] [-i IMAGE_PATH] [-o TIMEOUT] [-t TOPOLOGY] [-c AGGREGATE]
       
            With: 
            -h, --help                print this help message
            -c, --config AGGREGATE    use testbed AGGREGATE
            -t, --topology TOPOLOGY   a valid topology file or description (defaults to 'system:topo:all')
                                      (if a file 'TOPOLOGY' doesn't exist, interpret it as a comma-separated list of nodes)
            -i, --image IMAGE         disk image to load
                                      (default is 'baseline.ndz', the latest stable baseline image)
            -o, --timeout TIMEOUT     a duration (in sec.) after which imageNodes should stop waiting for
                                      nodes that have not finished their image installation
                                      (default is 800 sec, i.e. 13min 20sec)
                --outpath PATH        Path where the resulting Topologies should be saved
                                      (default is '/tmp')
                --outprefix PREFIX    Prefix to use for naming the resulting Topologies
                                      (default is your experiment ID)
       
            Some Examples: 
                          omf-5.4 load
                          omf-5.4 load -t system:topo:all -i baseline-2.4.ndz
                          omf-5.4 load -t omf.nicta.node1 -i wireless-2.6.ndz
                          omf-5.4 load -t omf.nicta.node1,omf.nicta.node2 -i baseline.ndz -o 400
                          omf-5.4 load -t system:topo:circle -i my_Own_Image.ndz
                          omf-5.4 load -t my_Own_Topology -i baseline-2.2.ndz -t 600 -c grid
                          omf-5.4 load -t my_Own_Topology --outpath ./ --outprefix my_Own_Prefix
      

      Two important arguments are TOPOLOGY describing the set of nodes one wishes to image , and !IMAGE specifying the name of the image one wants to load the nodes with. If the imaging process does not does not finish within the default timeout period, that period can be increase by using the -o flag (e.g. -o 1600). Typical command to load both nodes of sandbox 1 with the baseline image would look like:

      Example: omf load-i baseline.ndz -t node1-1

      username@console.sb3:~$ omf load -i baseline.ndz -t node1-1
      
      DEBUG FQDN:console.sb3.orbit-lab.org:
       INFO NodeHandler: OMF Experiment Controller 5.4 (git 861d645)
       INFO NodeHandler: Reading configuration file /etc/omf-expctl-5.4/services.yaml
       INFO NodeHandler: Add domain http - http://internal1.orbit-lab.org:5054/
       INFO NodeHandler: Add domain http - http://repository1.orbit-lab.org:5054/
       INFO NodeHandler: Add domain http - http://external1.orbit-lab.org:5054/
       INFO NodeHandler: Slice ID: pxe_slice
       INFO NodeHandler: Experiment ID: pxe_slice-2018-08-08t13.41.37.814-04.00
       INFO NodeHandler: Message authentication is disabled
       INFO Experiment: load system:exp:stdlib
       INFO property.resetDelay: resetDelay = 230 (Fixnum)
       INFO property.resetTries: resetTries = 1 (Fixnum)
       INFO Experiment: load system:exp:eventlib
       INFO Experiment: load system:exp:winlib
       INFO Experiment: load system:exp:imageNode
       INFO property.nodes: nodes = "node1-1" (String)
       INFO property.image: image = "baseline.ndz" (String)
       INFO property.domain: domain = "sb3.orbit-lab.org" (String)
       INFO property.outpath: outpath = "/tmp" (String)
       INFO property.outprefix: outprefix = "pxe_slice-2018-08-08t13.41.37.814-04.00" (String)
       INFO property.timeout: timeout = 800 (Fixnum)
       INFO property.resize: resize = nil (NilClass)
       INFO Topology: Loaded topology 'system:topo:registered'.
       INFO property.resetDelay: resetDelay = 100 (Fixnum)
       INFO Experiment: Resetting resources
       INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: node1-1.sb3.orbit-lab.org) [0 sec.]
       INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: node1-1.sb3.orbit-lab.org) [10 sec.]
       INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: node1-1.sb3.orbit-lab.org) [20 sec.]
       INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: node1-1.sb3.orbit-lab.org) [30 sec.]
       INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: node1-1.sb3.orbit-lab.org) [40 sec.]
       INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: node1-1.sb3.orbit-lab.org) [50 sec.]
       INFO exp: Progress(0/0/1): 0/0/0 min(node1-1.sb3.orbit-lab.org)/avg/max (59) - Timeout: 790 sec.
       INFO ALL_UP: Event triggered. Starting the associated tasks.
       INFO BRING_UP: Event triggered. Starting the associated tasks.
       INFO Experiment: Bringing up resources
       INFO exp: Progress(0/0/1): 50/50/50 min(node1-1.sb3.orbit-lab.org)/avg/max (59) - Timeout: 780 sec.
       INFO exp: Progress(0/0/1): 80/80/80 min(node1-1.sb3.orbit-lab.org)/avg/max (59) - Timeout: 770 sec.
       INFO exp: Progress(1/0/1): 100/100/100 min()/avg/max (59) - Timeout: 760 sec.
       INFO exp:  -----------------------------
       INFO exp:  Imaging Process Done
       INFO exp:  1 node successfully imaged - Topology saved in '/tmp/pxe_slice-2018-08-08t13.41.37.814-04.00-topo-success.rb'
       INFO exp:  -----------------------------
       INFO EXPERIMENT_DONE: Event triggered. Starting the associated tasks.
       INFO NodeHandler:
       INFO NodeHandler: Shutting down experiment, please wait...
       INFO NodeHandler:
       INFO NodeHandler: Shutdown flag is set - Turning Off the resources
       INFO run: Experiment pxe_slice-2018-08-08t13.41.37.814-04.00 finished after 1:44
      
      

      If the node is in the NOT REGISTERED state, you may need to wait for it to recover the POWEROFF state (it some times requires a few moments for the services to sync up). If the node takes more than 60 seconds to come out of the NODE NOT AVAILABLE state please report it to an administrator.

Prior to executing the "Hello World" experiment, users need to install an image on the hard disks of the nodes. For this tutorial we will use baseline.ndz. This image is built on top of Ubuntu 12.04, and is pre-configured with the proper modules and start up scripts to take advantage of the rest of the Orbit software / hardware. The imaging process will turn the nodes back off after completing imaging.

  1. Turn the nodes on: "omf tell -a on -t all"

    Load an Image

    1. Before we begin using the nodes, it's a good idea to check their status first. This is done with the omf stat command.

      omf stat

      This omf command is used to display the power status of the node/domain.

      Usage: omf stat

      username@consoles.outdoor:omf stat
      Returns the status of the nodes in a testbed
      Usage:
            omf-5.4 stat [-h] [-s] [-t TOPOLOGY] [-c AGGREGATE]
       
            With: 
            -h, --help                print this help message
            -s, --summary             print a summary of the node status for the testbed
            -c, --config AGGREGATE    use testbed AGGREGATE
            -t, --topology TOPOLOGY   a valid topology file or description (defaults to 'system:topo:all')
       
            Some Examples: 
                          omf-5.4 stat
                          omf-5.4 stat -s
                          omf-5.4 stat -t omf.nicta.node1,omf.nicta.node2 -c sb1
                          omf-5.4 stat -t system:topo:all -c grid
      

      Individual nodes are identified in the output of stat command by their fully qualified domain name (FQDN). This establishes their "coordinates" and the "domain" to which they belong. Nodes in different domains typically can NOT see each other. Node can be in 1 of 3 states:

      POWEROFF Node is Available for use but turned off
      POWERON Node is Available and is on
      NOT REGISTERED Node is not Available for use

      Example: omf stat on the outdoor domain

      user@console.outdoor:~# omf stat -t all
      
       INFO NodeHandler: OMF Experiment Controller 5.4 (git 6d34264)
       INFO NodeHandler: Slice ID: default_slice (default)
       INFO NodeHandler: Experiment ID: default_slice-2012-10-14t14.42.15-04.00
       INFO NodeHandler: Message authentication is disabled
       INFO Experiment: load system:exp:stdlib
       INFO property.resetDelay: value = 210 (Fixnum)
       INFO property.resetTries: value = 1 (Fixnum)
       INFO Experiment: load system:exp:eventlib
       INFO Experiment: load system:exp:stat
       INFO Topology: Loading topology ''.
       INFO property.nodes: value = "system:topo:all" (String)
       INFO property.summary: value = false (FalseClass)
       INFO Topology: Loading topology 'system:topo:all'.
       Talking to the CMC service, please wait
      -----------------------------------------------
       Domain: outdoor.orbit-lab.org
       Node: node3-6.outdoor.orbit-lab.org   	 State: NOT REGISTERED
       Node: node3-3.outdoor.orbit-lab.org   	 State: POWEROFF
       Node: node2-10.outdoor.orbit-lab.org    State: POWEROFF
       Node: node1-10.outdoor.orbit-lab.org    State: POWEROFF
       Node: node1-8.outdoor.orbit-lab.org   	 State: POWERON
       Node: node1-6.outdoor.orbit-lab.org   	 State: POWERON
       Node: node3-2.outdoor.orbit-lab.org   	 State: POWEROFF
       Node: node3-1.outdoor.orbit-lab.org   	 State: POWEROFF
       Node: node1-3.outdoor.orbit-lab.org   	 State: POWERON
       Node: node3-5.outdoor.orbit-lab.org   	 State: POWEROFF
       Node: node2-5.outdoor.orbit-lab.org   	 State: NOT REGISTERED
       Node: node1-2.outdoor.orbit-lab.org   	 State: POWERON
      -----------------------------------------------
       INFO Experiment: Switching ON resources which are OFF
       INFO EXPERIMENT_DONE: Event triggered. Starting the associated tasks.
       INFO NodeHandler: 
       INFO NodeHandler: Shutting down experiment, please wait...
       INFO NodeHandler: 
       INFO run: Experiment default_slice-2012-10-14t14.42.15-04.00 finished after 0:6
      

    2. It is recommended that the node be in the POWEROFF state prior to any experiment process. If the node is in the POWERON state you can use the omf tell command to get the node into the off state.

      omf tell

      OMF command to control the power state/reset the nodes.

      Usage: omf tell

      user@console:omf tell
      Switch ON/OFF and reboot the nodes in a testbed
      Usage:
            omf tell [-h] -t TOPOLOGY -a ACTION [-c AGGREGATE]
       
            With: 
            -h, --help           print this help message
       
            -a, --action ACTION  specify an action
            ACTION:
            on              turn node(s) ON
            offs            turn node(s) OFF (soft)
            offh            turn node(s) OFF (hard)
            reboot          reboots node(s) (soft)
            reset           resets node(s) (hard)
       
            -h, --help                print this help message
            -t, --topology TOPOLOGY   a valid topology file or description (MANDATORY)
            -c, --config AGGREGATE    use testbed AGGREGATE
       
            Some Examples: 
                          omf tell -a reset -t node1-1.grid.orbit-lab.org
                          omf tell -a on -t system:topo:all -c grid
                          omf tell -a reboot -t node1-1
                          omf tell -a offh -t [1..2,1..5]
                          omf tell -a offh -t system:topo:all
                          omf tell -a reset -t system:topo:imaged
      

      The commands are: on, offh (equivalent to pulling out the power cord), offs (software shutdown), reboot (software reboot) and reset (hardware reset).

      Example: turn off node1-1 on the outdoor domain

      user@console.outdoor:~# omf tell -a offh -t node1-1
      
       INFO NodeHandler: OMF Experiment Controller 5.4 (git 3fb37b9)
       INFO NodeHandler: Reading configuration file /etc/omf-expctl-5.4/services.yaml
       INFO NodeHandler: Add domain http - http://internal1.orbit-lab.org:5054/
       INFO NodeHandler: Add domain http - http://repository1.orbit-lab.org:5054/
       INFO NodeHandler: Slice ID: default_slice (default)
       INFO NodeHandler: Experiment ID: default_slice-2014-09-30t00.24.28.504-04.00
       INFO NodeHandler: Message authentication is disabled
       INFO Experiment: load system:exp:stdlib
       INFO property.resetDelay: resetDelay = 230 (Fixnum)
       INFO property.resetTries: resetTries = 1 (Fixnum)
       INFO Experiment: load system:exp:eventlib
       INFO Experiment: load system:exp:winlib
       INFO Experiment: load system:exp:tell
       INFO property.nodes: nodes = "node1-1" (String)
       INFO property.command: command = "offh" (String)
      
      Talking to the CMC service, please wait
      -----------------------------------------------
       Node: node1-1.outdoor.orbit-lab.org   	 Reply: OK
      -----------------------------------------------
      
       INFO EXPERIMENT_DONE: Event triggered. Starting the associated tasks.
       INFO NodeHandler: 
       INFO NodeHandler: Shutting down experiment, please wait...
       INFO NodeHandler: 
       INFO run: Experiment default_slice-2014-09-30t00.24.28.504-04.00 finished after 0:10
      

    3. Once node set is on an POWEROFF state, load an image with omf load command

      omf load

      Load command is used to put an image onto the hard disk of the node.

      Usage: omf load

      Install a given disk image on the nodes in a testbed
      Usage:
            omf-5.4 load [-h] [-i IMAGE_PATH] [-o TIMEOUT] [-t TOPOLOGY] [-c AGGREGATE]
       
            With: 
            -h, --help                print this help message
            -c, --config AGGREGATE    use testbed AGGREGATE
            -t, --topology TOPOLOGY   a valid topology file or description (defaults to 'system:topo:all')
                                      (if a file 'TOPOLOGY' doesn't exist, interpret it as a comma-separated list of nodes)
            -i, --image IMAGE         disk image to load
                                      (default is 'baseline.ndz', the latest stable baseline image)
            -o, --timeout TIMEOUT     a duration (in sec.) after which imageNodes should stop waiting for
                                      nodes that have not finished their image installation
                                      (default is 800 sec, i.e. 13min 20sec)
                --outpath PATH        Path where the resulting Topologies should be saved
                                      (default is '/tmp')
                --outprefix PREFIX    Prefix to use for naming the resulting Topologies
                                      (default is your experiment ID)
       
            Some Examples: 
                          omf-5.4 load
                          omf-5.4 load -t system:topo:all -i baseline-2.4.ndz
                          omf-5.4 load -t omf.nicta.node1 -i wireless-2.6.ndz
                          omf-5.4 load -t omf.nicta.node1,omf.nicta.node2 -i baseline.ndz -o 400
                          omf-5.4 load -t system:topo:circle -i my_Own_Image.ndz
                          omf-5.4 load -t my_Own_Topology -i baseline-2.2.ndz -t 600 -c grid
                          omf-5.4 load -t my_Own_Topology --outpath ./ --outprefix my_Own_Prefix
      

      Two important arguments are TOPOLOGY describing the set of nodes one wishes to image , and !IMAGE specifying the name of the image one wants to load the nodes with. If the imaging process does not does not finish within the default timeout period, that period can be increase by using the -o flag (e.g. -o 1600). Typical command to load both nodes of sandbox 1 with the baseline image would look like:

      Example: omf load-i baseline.ndz -t node1-1

      username@console.sb3:~$ omf load -i baseline.ndz -t node1-1
      
      DEBUG FQDN:console.sb3.orbit-lab.org:
       INFO NodeHandler: OMF Experiment Controller 5.4 (git 861d645)
       INFO NodeHandler: Reading configuration file /etc/omf-expctl-5.4/services.yaml
       INFO NodeHandler: Add domain http - http://internal1.orbit-lab.org:5054/
       INFO NodeHandler: Add domain http - http://repository1.orbit-lab.org:5054/
       INFO NodeHandler: Add domain http - http://external1.orbit-lab.org:5054/
       INFO NodeHandler: Slice ID: pxe_slice
       INFO NodeHandler: Experiment ID: pxe_slice-2018-08-08t13.41.37.814-04.00
       INFO NodeHandler: Message authentication is disabled
       INFO Experiment: load system:exp:stdlib
       INFO property.resetDelay: resetDelay = 230 (Fixnum)
       INFO property.resetTries: resetTries = 1 (Fixnum)
       INFO Experiment: load system:exp:eventlib
       INFO Experiment: load system:exp:winlib
       INFO Experiment: load system:exp:imageNode
       INFO property.nodes: nodes = "node1-1" (String)
       INFO property.image: image = "baseline.ndz" (String)
       INFO property.domain: domain = "sb3.orbit-lab.org" (String)
       INFO property.outpath: outpath = "/tmp" (String)
       INFO property.outprefix: outprefix = "pxe_slice-2018-08-08t13.41.37.814-04.00" (String)
       INFO property.timeout: timeout = 800 (Fixnum)
       INFO property.resize: resize = nil (NilClass)
       INFO Topology: Loaded topology 'system:topo:registered'.
       INFO property.resetDelay: resetDelay = 100 (Fixnum)
       INFO Experiment: Resetting resources
       INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: node1-1.sb3.orbit-lab.org) [0 sec.]
       INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: node1-1.sb3.orbit-lab.org) [10 sec.]
       INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: node1-1.sb3.orbit-lab.org) [20 sec.]
       INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: node1-1.sb3.orbit-lab.org) [30 sec.]
       INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: node1-1.sb3.orbit-lab.org) [40 sec.]
       INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: node1-1.sb3.orbit-lab.org) [50 sec.]
       INFO exp: Progress(0/0/1): 0/0/0 min(node1-1.sb3.orbit-lab.org)/avg/max (59) - Timeout: 790 sec.
       INFO ALL_UP: Event triggered. Starting the associated tasks.
       INFO BRING_UP: Event triggered. Starting the associated tasks.
       INFO Experiment: Bringing up resources
       INFO exp: Progress(0/0/1): 50/50/50 min(node1-1.sb3.orbit-lab.org)/avg/max (59) - Timeout: 780 sec.
       INFO exp: Progress(0/0/1): 80/80/80 min(node1-1.sb3.orbit-lab.org)/avg/max (59) - Timeout: 770 sec.
       INFO exp: Progress(1/0/1): 100/100/100 min()/avg/max (59) - Timeout: 760 sec.
       INFO exp:  -----------------------------
       INFO exp:  Imaging Process Done
       INFO exp:  1 node successfully imaged - Topology saved in '/tmp/pxe_slice-2018-08-08t13.41.37.814-04.00-topo-success.rb'
       INFO exp:  -----------------------------
       INFO EXPERIMENT_DONE: Event triggered. Starting the associated tasks.
       INFO NodeHandler:
       INFO NodeHandler: Shutting down experiment, please wait...
       INFO NodeHandler:
       INFO NodeHandler: Shutdown flag is set - Turning Off the resources
       INFO run: Experiment pxe_slice-2018-08-08t13.41.37.814-04.00 finished after 1:44
      
      

      If the node is in the NOT REGISTERED state, you may need to wait for it to recover the POWEROFF state (it some times requires a few moments for the services to sync up). If the node takes more than 60 seconds to come out of the NODE NOT AVAILABLE state please report it to an administrator.

At this point the nodes disks are imaged with the baseline image and need to be turned back on before proceeding. Give the nodes a couple of minutes to turn on / boot, then check their status with omf stat

  1. Execute experiment with omf: "omf exec test:exp:tutorial:hello-world-wireless -- --res1 node1-1.sb1.orbit-lab.org --res2 node1-2.sb1.orbit-lab.org"

    omf exec

    Usage: omf exec

    user@console: omf exec
    
    OMF Experiment Controller 5.4 (git 9ac2ff9)
    
    Execute an experiment script
    
    Usage:  exec [OPTIONS] ExperimentName [-- EXP_OPTIONS]
    
    	ExperimentName is the filename of the experiment script
    	[EXP_OPTIONS] are any options defined in the experiment script
    	[OPTIONS] are any of the following:
    
        -a, --allow-missing              Continue experiment even if some nodes did not check in
        -C, --configfile FILE            File containing local configuration parameters
        -c, --config NAME                Configuration section from the config file ('default' if omitted)
        -d, --debug                      Operate in debug mode
        -i, --interactive                Run the experiment controller in interactive mode
        -l, --libraries LIST             Comma separated list of libraries to load (defaults to [system:exp:stdlib,system:exp:eventlib])
            --log FILE                   File containing logging configuration information
        -m, --message MESSAGE            Message to add to experiment trace
        -n, --just-print                 Print the commands that would be executed, but do not execute them
        -p, --print URI                  Print to the console the content of the experiment resource URI
        -o, --output-result FILE         File to write final state information to
        -e, --experiment-id EXPID        Set the ID for this experiment, instead of the default standard ID
        -O, --output-app                 Display STDOUT & STDERR output from the executed applications
        -r, --reset                      If set, then reset (reboot) the nodes before the experiment
        -S, --slice NAME                 Name of the Slice where this EC should operate
        -s, --shutdown                   If set, then shut down resources at the end of an experiment
        -t, --tags TAGS                  Comma separated list of tags to add to experiment trace
            --oml-uri URI                The URI to the OML server for this experiment
        -x, --extra-libs LIST            Comma separated list of libraries to load in addition to [system:exp:stdlib,system:exp:eventlib]
            --slave-mode EXPID           Run in slave mode in disconnected experiment, EXPID is the exp. ID
            --slave-mode-resource NAME   When in slave mode, NAME is the HRN of the resource for this EC
        -h, --help                       Show this message
        -v, --version                    Show the version
    

Execute "Hello World" experiment. This is a simple wireless experiment that establishes the WiFi link between two nodes and transfers data for 60 seconds. Be sure to specify the fully qualified domain names (FQDNs) of the two nodes involved. For the output below, it is assumed to be "sb1.orbit-lab.org".

username@console.sb1:~$ omf exec test:exp:tutorial:hello-world-wireless -- --res1 node1-1.sb1.orbit-lab.org --res2 node1-2.sb1.orbit-lab.org
 INFO NodeHandler: OMF Experiment Controller 5.4 (git c005675)
 INFO NodeHandler: Slice ID: default_slice (default)
 INFO NodeHandler: Experiment ID: default_slice-2013-01-29t01.03.19-05.00
 INFO NodeHandler: Message authentication is disabled
 INFO Experiment: load system:exp:stdlib
 INFO property.resetDelay: resetDelay = 230 (Fixnum)
 INFO property.resetTries: resetTries = 1 (Fixnum)
 INFO Experiment: load system:exp:eventlib
 INFO Experiment: load test:exp:tutorial:hello-world-wireless
 INFO property.duration: duration = 60 (Fixnum)
 INFO property.graph: graph = false (FalseClass)
 INFO Topology: Loading topology 'system:topo:imaged'.
 INFO Topology: Loading topology '/tmp/pxe_slice-2013-01-26t22.21.22-05.00-topo-success'.
 INFO ALL_UP_AND_INSTALLED: Event triggered. Starting the associated tasks.
 INFO exp: This is my first OMF experiment
 INFO exp: Request from Experiment Script: Wait for 15s....
 INFO node1-1.sb1.orbit-lab.org: Device 'net/w0' reported Not-Associated
 INFO node1-2.sb1.orbit-lab.org: Device 'net/w0' reported Not-Associated
 INFO node1-1.sb1.orbit-lab.org: Device 'net/w0' reported 12:B2:78:8E:8D:4F
 INFO node1-2.sb1.orbit-lab.org: Device 'net/w0' reported 12:B2:78:8E:8D:4F
 INFO exp: All my Applications are started now...
 INFO exp: Request from Experiment Script: Wait for 60s....
 INFO exp: All my Applications are stopped now.
 INFO EXPERIMENT_DONE: Event triggered. Starting the associated tasks.
 INFO NodeHandler: 
 INFO NodeHandler: Shutting down experiment, please wait...
 INFO NodeHandler: 
 INFO run: Experiment default_slice-2013-01-29t01.03.19-05.00 finished after 1:23

username@console.sb1:~$
  1. Analyze the results

Use various tools to analyze the results.

  1. (optionally) Save the image

    Save Image with omf: "omf save -n node1-1.sb1.orbit-lab.org"

    How to save a disk image from one node of a Testbed

    Once you have the image prepared the way you want it. On the node run:

    ssh root@node1-1.sb1.orbit-lab.org
    root@node1-1.sb1.orbit-lab.org: ./prepare.sh
    

    This will remove udev rules (to prevent renaming of interfaces) and dump log files to lower the size of the image. It will also shutdown the node.

    Once the node has been shutdown, to save the existing disk image on node (1,1) of the 'sb1' testbed, use the command:

     omf save -n node1-1.sb1.orbit-lab.org 
     # will save the current disk image on node [1,1] of the 'sb1' testbed
    
    

    The output of this image saving process will look like the following:

    INFO NodeHandler: OMF Experiment Controller 5.4 (git c005675)
    INFO NodeHandler: Slice ID: pxe_slice 
    INFO NodeHandler: Experiment ID: pxe_slice-2013-02-06t14.14.46-05.00
    INFO NodeHandler: Message authentication is disabled
    INFO Experiment: load system:exp:stdlib
    INFO property.resetDelay: resetDelay = 230 (Fixnum)
    INFO property.resetTries: resetTries = 1 (Fixnum)
    INFO Experiment: load system:exp:eventlib
    INFO Experiment: load system:exp:saveNode
    INFO property.node: node = "node1-1.sb1.orbit-lab.org" (String)
    INFO property.pxe: pxe = "1.1.6" (String)
    INFO property.domain: domain = "grid.orbit-lab.org" (String)
    INFO property.started: started = "false" (String)
    INFO property.image: image = nil (NilClass)
    INFO property.resize: resize = nil (NilClass)
    WARN exp: Saving only works for ext2/ext3 partitions and MBR (msdos) partition tables. Saving any other filesystem or partition table type will produce a 0 byte image.
    INFO Topology: Loading topology 'node1-1.sb1.orbit-lab.org'.
    INFO Experiment: Resetting resources
    INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: node1-1.sb1.orbit-lab.org) [0 sec.]
    .
    .
    .
    INFO stdlib: Waiting for nodes (Up/Down/Total): 0/1/1 - (still down: node1-1.sb1.orbit-lab.org) [80 sec.]
    INFO ALL_UP: Event triggered. Starting the associated tasks.
    INFO node1-1.sb1.orbit-lab.org:  
    INFO node1-1.sb1.orbit-lab.org: - Saving image of '/dev/sda' on node 'node1-1.sb1.orbit-lab.org'
    INFO node1-1.sb1.orbit-lab.org:   to the file 'bob-node-node1-1.sb1.orbit-lab.org-2013-02-06-14-16-23.ndz' on host '10.10.0.42'
    INFO node1-1.sb1.orbit-lab.org:  
    INFO property.started: started = "true" (String)
    INFO exp:  
    INFO exp: - Saving process started at: Wed Feb 06 14:16:27 -0500 2013
    INFO exp:   (this may take a while depending on the size of your image)
    INFO Experiment: DONE!
    INFO ExecApp: Application 'commServer' finished
    INFO run: Experiment sb1_2008_07_20_23_38_04 finished after 9:19
    done.
    
    

    At the end of the saving process, you will have disk image file with the name: "bob-node-node1-1.sb1.orbit-lab.org-2013-02-06-14-16-23.ndz" in the directory "/mnt/images" on the machine with the host name "repository1". These information are all provided in the output displayed above.

    You can then:

    • install this disk image on a set of nodes, using the instructions described in this tutorial. In this example, to install the newly created disk image on node (1,2):
        omf load -t node1-2.sb1.orbit-lab.org -i bob-node-node1-1.sb1.orbit-lab.org-2013-02-06-14-16-23.ndz
        # will install the disk image on node [1,2] of the 'sb1' testbed
      
      
    • only for the purpose of making a backup of your image: log into the "repository" machine, and copy your image to your backup storage.


    Learning More

    The above disk image saving process is implemented as a special orbit experiment. As such, its execution will results in a log file as with any other orbit experiment. This log file should look like this.

    Each image created by the above saving process is a full hard-disk image, which can have arbitrary large file size (>200Mo or even more). As storage on the "frisbee" machine is limited, please be considerate in the number of images you save/use, and move any unused images to your own archival storage.

    The generic omf command used above is the access point to control various ORBIT functions, such as the saving of a disk image from a node, with the sub-command "save". To see a list of all the available omf commands, you should type "omf help".

    Finally, the complete available option/commands for the save function are given by "omf help save":

    omf save --help
    Save a disk image from a given node into an archive file
    Usage:
          omf save -n NODE [-h] [-c AGGREGATE]
     
          With: 
          -h, --help          print this help message
          -n, --node NODE     a valid description of a single node (MANDATORY)
                              (no default here, you have to enter a node!)
          -r, --resize SIZE   Resizes the first partition to SIZE GB or to maximum size if SIZE=0 or
                              leave x percent of free space if SIZE=x%
     
          Some Examples: 
                        omf save -n node5-3.grid.orbit-lab.org
                        omf save -n node1-1.sb2.orbit-lab.org
     
    
    If you modified the basiline image and/or added software to it, you want to save it into repository before the end of your time slot!

Where to go from here

If you are still unsure what Orbit is, please read the FAQ and check other tutorials.

Attachments (5)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.