Changes between Version 56 and Version 57 of Documentation/CGettingStarted


Ignore:
Timestamp:
Jan 16, 2013, 8:06:41 PM (11 years ago)
Author:
ssugrim
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Documentation/CGettingStarted

    v56 v57  
    1 [wiki:WikiStart Orbit] > HowToGetStarted
     1[wiki:WikiStart Orbit] > GettingStarted
    22
    33= How to get started =
     
    2727||Sandbox 8 || 2 || console.sb8.orbit-lab.org || None ||
    2828||Sandbox 9 || 11 || console.sb9.orbit-lab.org || Netfpga + Openflow ||
     29||Outdoor || Variable || console.outdoor.orbit-lab.org || Variable ||
    2930
    3031
     
    3637When you have successfully logged in, you can start an experiment using the [wiki:/Software/bAM#AggregateManagers Orbit Management Framework (OMF)]. First time users are '''highly''' encouraged to reserve time on a sandbox instead of the main grid, and start with the built-in [wiki:/Tutorials/HelloWorld Hello World] experiment.
    3738 1. Before we begin using the nodes, it's a good idea to check their status first. This is done with the omf stat command. This will typically produce a result like:
     39    {{{
     40    user@console.outdoor:~$ omf stat
    3841
    39    [[Image(newhowto1.jpg)]]
     42    INFO NodeHandler: OMF Experiment Controller 5.4 (git c005675)
     43    INFO NodeHandler: Slice ID: default_slice (default)
     44    INFO NodeHandler: Experiment ID: default_slice-2013-01-16t14.42.48-05.00
     45    INFO NodeHandler: Message authentication is disabled
     46    INFO Experiment: load system:exp:stdlib
     47    INFO property.resetDelay: resetDelay = 230 (Fixnum)
     48    INFO property.resetTries: resetTries = 1 (Fixnum)
     49    INFO Experiment: load system:exp:eventlib
     50    INFO Experiment: load system:exp:stat
     51    INFO Topology: Loading topology ''.
     52    INFO property.nodes: nodes = "system:topo:all" (String)
     53    INFO property.summary: summary = false (FalseClass)
     54    INFO Topology: Loading topology 'system:topo:all'.
     55    Talking to the CMC service, please wait
     56    -----------------------------------------------
     57    Domain: outdoor.orbit-lab.org
     58    Node: node1-1.outdoor.orbit-lab.org          State: POWEROFF
     59    Node: node1-10.outdoor.orbit-lab.org         State: POWEROFF
     60    Node: node1-2.outdoor.orbit-lab.org          State: POWEROFF
     61    Node: node1-3.outdoor.orbit-lab.org          State: POWEROFF
     62    Node: node1-4.outdoor.orbit-lab.org          State: POWEROFF
     63    Node: node1-5.outdoor.orbit-lab.org          State: POWEROFF
     64    Node: node1-6.outdoor.orbit-lab.org          State: POWEROFF
     65    Node: node1-7.outdoor.orbit-lab.org          State: POWEROFF
     66    Node: node1-8.outdoor.orbit-lab.org          State: POWEROFF
     67    Node: node1-9.outdoor.orbit-lab.org          State: POWEROFF
     68    Node: node2-10.outdoor.orbit-lab.org         State: POWEROFF
     69    Node: node2-2.outdoor.orbit-lab.org          State: POWEROFF
     70    Node: node2-3.outdoor.orbit-lab.org          State: NOT REGISTERED
     71    Node: node3-1.outdoor.orbit-lab.org          State: POWEROFF
     72    Node: node3-2.outdoor.orbit-lab.org          State: POWEROFF
     73    Node: node3-3.outdoor.orbit-lab.org          State: POWEROFF
     74    Node: node3-4.outdoor.orbit-lab.org          State: POWEROFF
     75    Node: node3-5.outdoor.orbit-lab.org          State: POWERON
     76    Node: node3-7.outdoor.orbit-lab.org          State: POWEROFF
     77    Node: node3-8.outdoor.orbit-lab.org          State: POWEROFF
     78    Node: node4-1.outdoor.orbit-lab.org          State: POWEROFF
     79    Node: node4-10.outdoor.orbit-lab.org         State: POWEROFF
     80    Node: node4-11.outdoor.orbit-lab.org         State: POWEROFF
     81    -----------------------------------------------
     82    INFO EXPERIMENT_DONE: Event triggered. Starting the associated tasks.
     83    INFO NodeHandler:
     84    INFO NodeHandler: Shutting down experiment, please wait...
     85    INFO NodeHandler:
     86    INFO run: Experiment default_slice-2013-01-16t14.42.48-05.00 finished after 0:6
     87    }}}
     88    Individual nodes are identified by their fully qualified domain name (FQDN). This establishes their "coordinates" and the "domain" to which they belong. Nodes in
     89    different domains can NOT see each other.
    4090
    41    Individual nodes are identified by their fully qualified domain name (FQDN). This establishes their "coordinates" and the "domain" to which they belong. Nodes in different domains can NOT see each other.
    42  2. The node can be in 1 of 3 states:
     91 2. Node can be in 1 of 3 states:
    4392
    44    || POWEROFF || Node is Available for use but turned off ||
    45    || POWERON || Node is Available and is on ||
    46    || NODE NOT AVAILABLE || Node is not Available for use||
     93    || POWEROFF      || Node is Available for use but turned off ||
     94    || POWERON        || Node is Available and is on ||
     95    || NOT REGISTERED || Node is not Available for use ||[[BR]]
    4796
    4897 3. It is recommended that the node be in the POWEROFF state prior to any experiment process. If the node is in the POWERON state you can use the omf tell command
    49    to get the node into the off state.
    50    {{{
    51    username@console.sb1:~$ omf tell -a offh -t TOPOLOGY
    52    }}}
    53    The ''TOPOLOGY'' can take on many forms, the simplest being a comma separated list of FQDN's. There are special predefined topologies like: all, system:topo:circle, ... For more details see [wiki:/Software/bAM#AggregateManagers OMF documentation]
    54    If the node is in the NODE NOT AVAILABLE state, you may need to wait for it to recover the POWEROFF state (it some times requires a few moments for the service to sync up). If
    55    the node never comes out of the NODE NOT AVAILABLE state please contact an administrator.
     98    to get the node into the off state.
     99    {{{
     100    username@console.domain:~$ omf tell -a offh -t TOPOLOGY
     101    }}}
     102    The ''TOPOLOGY'' can take on many forms, the simplest being a comma separated list of FQDN's. There are special predefined topologies like: all, system:topo:circle, ...
     103    For more details see [wiki:/Software/bAM#AggregateManagers OMF documentation]
     104    If the node is in the NOT REGISTERED state, you may need to wait for it to recover the POWEROFF state (it some times requires a few moments for the services to sync up). If
     105    the node never comes out of the NODE NOT AVAILABLE state please contact an administrator.
    56106
    57  4. Prior to the experiment, users need to install an image on the hard disks of the nodes. If you have not created a custom image use the default starting image: '''baseline.ndz'''. This image is built on top of ubuntu 12.04, and is pre-configured with the proper modules and start up scripts to take advantage of the rest of the Orbit services.  Loading an image is done with the omf load command
    58    {{{
    59    username@console.sb1:~$ omf load -t TOPOLOGY -i IMAGENAME
    60    }}}
    61    Where ''TOPOLOGY'' is the set of nodes you wish to image , and !IMAGENAME is the name of the image you with to load. The most common sandbox starting image command would look like
    62    {{{
    63    username@console.sb1:~$ omf load -t all -i baseline.ndz
    64    }}}
    65    which will load all the nodes of sandbox 1 (totaling 1) with the [wiki:Documentation/SupportedImages baseline] image.
    66  5. The process start should look like:
     107 4. Prior to the experiment, users need to install an image on the hard disks of the nodes. If you have not created a custom image use the default starting image:
     108    '''baseline.ndz'''. This image is built on top of '''Ubuntu 12.04''', and is pre-configured with the proper modules and start up scripts to take advantage of the rest of
     109    the Orbit services / hardware.  Loading an image is done with the [wiki:/Software/bAM#AggregateManagers omf load command].
     110    {{{
     111    username@console.domain:~$ omf load -t TOPOLOGY -i IMAGENAME
     112    }}}
     113    Where ''TOPOLOGY'' is the set of nodes you wish to image , and !IMAGENAME is the name of the image you with to load. The most common sandbox starting image command
     114    would look like
     115    {{{
     116    username@console.domain:~$ omf load -t all -i baseline.ndz
     117    }}}
     118    which will load all the nodes of sandbox 1 (totaling 1) with the [wiki:Documentation/SupportedImages baseline] image. An example run on sandbox 7 looks like:
     119    {{{
     120user@console.sb7:~$ omf load -t all -i baseline.ndz
    67121
    68    [[Image(newhowto2.jpg)]]
    69  6. A key line to look for is ''INFO whenAll: *: 'status[@value='UP']' fires''. This line indicates that all the nodes have come up and imaging has begun:
    70    
    71    [[Image(newhowto3.jpg)]]
     122 INFO NodeHandler: OMF Experiment Controller 5.4 (git c005675)
     123 INFO NodeHandler: Slice ID: pxe_slice
     124 INFO NodeHandler: Experiment ID: pxe_slice-2013-01-16t14.56.02-05.00
     125 INFO NodeHandler: Message authentication is disabled
     126 INFO Experiment: load system:exp:stdlib
     127 INFO property.resetDelay: resetDelay = 230 (Fixnum)
     128 INFO property.resetTries: resetTries = 1 (Fixnum)
     129 INFO Experiment: load system:exp:eventlib
     130 INFO Experiment: load system:exp:imageNode
     131 INFO property.nodes: nodes = "system:topo:all" (String)
     132 INFO property.image: image = "baseline.ndz" (String)
     133 INFO property.domain: domain = "sb7.orbit-lab.org" (String)
     134 INFO property.outpath: outpath = "/tmp" (String)
     135 INFO property.outprefix: outprefix = "pxe_slice-2013-01-16t14.56.02-05.00" (String)
     136 INFO property.timeout: timeout = 800 (Fixnum)                                                                                         
     137 INFO property.resize: resize = nil (NilClass)
     138 INFO Topology: Loading topology 'system:topo:all'.
     139 INFO Experiment: Resetting resources
     140 INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: node1-2.sb7.orbit-lab.org,node1-1.sb7.orbit-lab.org) [0 sec.]
     141 INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: node1-2.sb7.orbit-lab.org,node1-1.sb7.orbit-lab.org) [10 sec.]
     142 INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: node1-2.sb7.orbit-lab.org,node1-1.sb7.orbit-lab.org) [20 sec.]
     143 INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: node1-2.sb7.orbit-lab.org,node1-1.sb7.orbit-lab.org) [30 sec.]
     144 INFO ALL_UP: Event triggered. Starting the associated tasks.
     145 INFO exp: Progress(0/0/2): 0/0/0 min(node1-2.sb7.orbit-lab.org)/avg/max (30) - Timeout: 760 sec.
     146 INFO exp: Progress(0/0/2): 10/10/10 min(node1-2.sb7.orbit-lab.org)/avg/max (30) - Timeout: 750 sec.
     147 INFO exp: Progress(0/0/2): 10/15/20 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 740 sec.
     148 INFO exp: Progress(0/0/2): 20/25/30 min(node1-2.sb7.orbit-lab.org)/avg/max (30) - Timeout: 730 sec.
     149 INFO exp: Progress(0/0/2): 30/35/40 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 720 sec.
     150 INFO exp: Progress(0/0/2): 40/40/40 min(node1-2.sb7.orbit-lab.org)/avg/max (30) - Timeout: 710 sec.
     151 INFO exp: Progress(0/0/2): 40/45/50 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 700 sec.
     152 INFO exp: Progress(0/0/2): 50/55/60 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 690 sec.
     153 INFO exp: Progress(0/0/2): 60/65/70 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 680 sec.
     154 INFO exp: Progress(0/0/2): 60/65/70 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 670 sec.
     155 INFO exp: Progress(0/0/2): 70/75/80 min(node1-2.sb7.orbit-lab.org)/avg/max (30) - Timeout: 660 sec.
     156 INFO exp: Progress(0/0/2): 90/90/90 min(node1-2.sb7.orbit-lab.org)/avg/max (30) - Timeout: 650 sec.
     157 INFO exp: Progress(1/0/2): 90/95/100 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 640 sec.
     158 INFO exp: Progress(2/0/2): 100/100/100 min()/avg/max (30) - Timeout: 630 sec.
     159 INFO exp:  -----------------------------
     160 INFO exp:  Imaging Process Done
     161 INFO exp:  2 nodes successfully imaged - Topology saved in '/tmp/pxe_slice-2013-01-16t14.56.02-05.00-topo-success.rb'
     162 INFO exp:  -----------------------------
     163 INFO EXPERIMENT_DONE: Event triggered. Starting the associated tasks.
     164 INFO NodeHandler:
     165 INFO NodeHandler: Shutting down experiment, please wait...
     166 INFO NodeHandler:
     167 INFO NodeHandler: Shutdown flag is set - Turning Off the resources
     168 INFO run: Experiment pxe_slice-2013-01-16t14.56.02-05.00 finished after 3:13
     169    }}}
    72170
    73  7. The final result should look like:
    74 
    75    [[Image(newhowto4.jpg)]]
    76 
    77  8. At this point the nodes are imaged with the ''basline'' image and need to be turned back on before proceeding.
    78    {{{
    79    username@console.sb2:~$ omf-5.2 tell on all
    80    }}}
    81    Give the nodes a couple of minutes to turn on. To check the status of the node:
    82    {{{
    83    username@console.sb2:~$ omf-5.2 stat
    84    }}}
    85 
    86  9. By default the driver modules for the wireless interfaces are disabled. It is up to the experimenter to decide which interface to use. For this tutorial experiment the ath_pci module will be used. So before running the tutorial experiment ssh into each node (ie. node1-1 & node 1-2) and load the driver modules.
    87    {{{
    88    username@console.sb2:~$ ssh root@node1-1
    89    }}}
    90    From the node load the driver module:
    91    {{{
    92    root@node1-1:~# modprobe ath_pci
    93    }}}
    94    Verify that the module has been loaded into the kernel
    95    {{{
    96    root@node1-1:~# lsmod
    97    }}}
    98    
    99    [[Image(newhowto_lsmod.jpg)]]
    100 
    101    Now ssh into ''node1-2'' and do the same.
    102  10. To run a tutorial experiment that involves one UDP traffic sender and one receiver, run the following command at the console.
    103    {{{
    104    username@console.sb2:~$ omf-5.2 exec --tutorial -- --tutorialName tutorial-1a
    105    }}}
    106    Make note of the unique experiment ID as shown in the experiment output below. This ID can be used later to view the results from a database (sqlite3) file.
    107 
    108    [[Image(newhowto5.jpg)]]
    109    
    110 
    111  11. Both, sender and receiver, report measurements to a database, using the OML measurement framework. The file is saved as a sqlite3 file; the file name for the experiment is shown in the last line of the tutorial's output and saved in the console under /var/lib/oml2
    112 
    113 To dump the database file for this experiment:
    114    {{{
    115     username@console.sb2:~$ sqlite3 /var/lib/oml2/sb8.orbit-lab.org_2011_07_12_16_00_33.sq3 ".dump"
    116    }}}
    117 
    118 
    119 
    120 
    121 The experiment can be started with:
    122 {{{
    123 user#> nodehandler -t
    124 }}}
    125 
    126 [[Image(howto4.PNG)]]
    127 
    128  * This experiment will send UDP datagrams of 1024 bytes from node 1-1 to node 1-2 at 300 kbps CBR traffic.
    129  * Both, sender and receiver, report measurements to a database, using our [wiki:Documentation/OML OML] measurement framework.
    130  * As shown below, the experiment controller will power on the nodes involved in the experiment and will issue experiment commands to each node.
    131  * Each experiment has a unique experiment ID as shown in the figure, that can be used later to view the results from the database
    132 [[Image(howto5.PNG)]]
    133 
    134 Alternatively, a specific script can be run as follows:
    135 
    136 The experiment can be started with:
    137 {{{
    138 user#> nodehandler <full-path/script-name>
    139 }}}
    140 
    141 For e.g., if my script is called orbit-test.rb and it resides in /home/joenull/Ruby-Scripts/ (ORBIT home directory), I would execute it as follows:
    142 {{{
    143 user#>pwd
    144 /home/joenull
    145 user#>nodehandler ~/Ruby-Scripts/orbit-test
    146 }}}
    147 
    148 Note that I leave out the ".rb" at the end. This will execute the scripts and turn the nodes OFF at the end of the experiment. If you want to leave them ON after the experiment, use the "-k" flag. For e.g.
    149 {{{
    150 user#>pwd
    151 /home/joenull
    152 user#>nodehandler -k ~/Ruby-Scripts/orbit-test
    153 }}}
    154 
    155 The experimenter can also move to where the script resides and execute it (without giving the full path) since nodehandler will look for the script in the current directory.
    156 
    157 More information on writing experiment scripts can be found in the [wiki:Tutorial Tutorial].
    158 
    159 == Analyzing Results ==
    160 
    161 Orbit provides a sophisticated framework to efficiently collect measurements at runtime into a database. This database is accessible to the experimenter during the experiment from the console. At the end of an experiment, the database is copied to an external machine and is accessible without a reservation. More information can be found [wiki:Tutorial/AnalyzeResults here].
     171 5. The imageing process will turn the nodes back off after completing imageing. At this point the nodes disks are imaged with the ''basline'' image
     172    and need to be turned back on before proceeding.
     173    {{{
     174    username@console.domain:~$ omf tell -a on -t all
     175    }}}
     176    Give the nodes a couple of minutes to turn on / boot, then check their status with omf stat.
    162177
    163178= Where to go from here =