Context Navigation

Changes between Version 56 and Version 57 of Documentation/CGettingStarted

Timestamp:: Jan 16, 2013, 8:06:41 PM (11 years ago)
Author:: ssugrim
Comment:: —

Legend:

: Unmodified
: Added
: Removed
: Modified

Documentation/CGettingStarted

-              v56
+              v57
 [wiki:WikiStart Orbit] > HowToGetStarted
+[wiki:WikiStart Orbit] > GettingStarted
 = How to get started =
 …
 ||Sandbox 8 || 2 || console.sb8.orbit-lab.org || None ||
 ||Sandbox 9 || 11 || console.sb9.orbit-lab.org || Netfpga + Openflow ||
+||Outdoor || Variable || console.outdoor.orbit-lab.org || Variable ||
 …
 When you have successfully logged in, you can start an experiment using the [wiki:/Software/bAM#AggregateManagers Orbit Management Framework (OMF)]. First time users are '''highly''' encouraged to reserve time on a sandbox instead of the main grid, and start with the built-in [wiki:/Tutorials/HelloWorld Hello World] experiment.
 . Before we begin using the nodes, it's a good idea to check their status first. This is done with the omf stat command. This will typically produce a result like:
+    {{{
+    user@console.outdoor:~$ omf stat
+   [[Image(newhowto1.jpg)]]
+    INFO NodeHandler: OMF Experiment Controller 5.4 (git c005675)
+    INFO NodeHandler: Slice ID: default_slice (default)
+    INFO NodeHandler: Experiment ID: default_slice-2013-01-16t14.42.48-05.00
+    INFO NodeHandler: Message authentication is disabled
+    INFO Experiment: load system:exp:stdlib
+    INFO property.resetDelay: resetDelay = 230 (Fixnum)
+    INFO property.resetTries: resetTries = 1 (Fixnum)
+    INFO Experiment: load system:exp:eventlib
+    INFO Experiment: load system:exp:stat
+    INFO Topology: Loading topology ''.
+    INFO property.nodes: nodes = "system:topo:all" (String)
+    INFO property.summary: summary = false (FalseClass)
+    INFO Topology: Loading topology 'system:topo:all'.
+    Talking to the CMC service, please wait
+    -----------------------------------------------
+    Domain: outdoor.orbit-lab.org
+    Node: node1-1.outdoor.orbit-lab.org          State: POWEROFF
+    Node: node1-10.outdoor.orbit-lab.org         State: POWEROFF
+    Node: node1-2.outdoor.orbit-lab.org          State: POWEROFF
+    Node: node1-3.outdoor.orbit-lab.org          State: POWEROFF
+    Node: node1-4.outdoor.orbit-lab.org          State: POWEROFF
+    Node: node1-5.outdoor.orbit-lab.org          State: POWEROFF
+    Node: node1-6.outdoor.orbit-lab.org          State: POWEROFF
+    Node: node1-7.outdoor.orbit-lab.org          State: POWEROFF
+    Node: node1-8.outdoor.orbit-lab.org          State: POWEROFF
+    Node: node1-9.outdoor.orbit-lab.org          State: POWEROFF
+    Node: node2-10.outdoor.orbit-lab.org         State: POWEROFF
+    Node: node2-2.outdoor.orbit-lab.org          State: POWEROFF
+    Node: node2-3.outdoor.orbit-lab.org          State: NOT REGISTERED
+    Node: node3-1.outdoor.orbit-lab.org          State: POWEROFF
+    Node: node3-2.outdoor.orbit-lab.org          State: POWEROFF
+    Node: node3-3.outdoor.orbit-lab.org          State: POWEROFF
+    Node: node3-4.outdoor.orbit-lab.org          State: POWEROFF
+    Node: node3-5.outdoor.orbit-lab.org          State: POWERON
+    Node: node3-7.outdoor.orbit-lab.org          State: POWEROFF
+    Node: node3-8.outdoor.orbit-lab.org          State: POWEROFF
+    Node: node4-1.outdoor.orbit-lab.org          State: POWEROFF
+    Node: node4-10.outdoor.orbit-lab.org         State: POWEROFF
+    Node: node4-11.outdoor.orbit-lab.org         State: POWEROFF
+    -----------------------------------------------
+    INFO EXPERIMENT_DONE: Event triggered. Starting the associated tasks.
+    INFO NodeHandler:
+    INFO NodeHandler: Shutting down experiment, please wait...
+    INFO NodeHandler:
+    INFO run: Experiment default_slice-2013-01-16t14.42.48-05.00 finished after 0:6
+    }}}
+    Individual nodes are identified by their fully qualified domain name (FQDN). This establishes their "coordinates" and the "domain" to which they belong. Nodes in
+    different domains can NOT see each other.
+   Individual nodes are identified by their fully qualified domain name (FQDN). This establishes their "coordinates" and the "domain" to which they belong. Nodes in different domains can NOT see each other.
+. The node can be in 1 of 3 states:
+. Node can be in 1 of 3 states:
    || POWEROFF || Node is Available for use but turned off ||
    || POWERON || Node is Available and is on ||
    || NODE NOT AVAILABLE || Node is not Available for use||
+    || POWEROFF       || Node is Available for use but turned off ||
+    || POWERON        || Node is Available and is on ||
+    || NOT REGISTERED || Node is not Available for use ||[[BR]]
 . It is recommended that the node be in the POWEROFF state prior to any experiment process. If the node is in the POWERON state you can use the omf tell command
+   to get the node into the off state.
+   {{{
+   username@console.sb1:~$ omf tell -a offh -t TOPOLOGY
+   }}}
+   The ''TOPOLOGY'' can take on many forms, the simplest being a comma separated list of FQDN's. There are special predefined topologies like: all, system:topo:circle, ... For more details see [wiki:/Software/bAM#AggregateManagers OMF documentation]
+   If the node is in the NODE NOT AVAILABLE state, you may need to wait for it to recover the POWEROFF state (it some times requires a few moments for the service to sync up). If
+   the node never comes out of the NODE NOT AVAILABLE state please contact an administrator.
+    to get the node into the off state.
+    {{{
+    username@console.domain:~$ omf tell -a offh -t TOPOLOGY
+    }}}
+    The ''TOPOLOGY'' can take on many forms, the simplest being a comma separated list of FQDN's. There are special predefined topologies like: all, system:topo:circle, ...
+    For more details see [wiki:/Software/bAM#AggregateManagers OMF documentation]
+    If the node is in the NOT REGISTERED state, you may need to wait for it to recover the POWEROFF state (it some times requires a few moments for the services to sync up). If
+    the node never comes out of the NODE NOT AVAILABLE state please contact an administrator.
+. Prior to the experiment, users need to install an image on the hard disks of the nodes. If you have not created a custom image use the default starting image: '''baseline.ndz'''. This image is built on top of ubuntu 12.04, and is pre-configured with the proper modules and start up scripts to take advantage of the rest of the Orbit services.  Loading an image is done with the omf load command
+   {{{
+   username@console.sb1:~$ omf load -t TOPOLOGY -i IMAGENAME
+   }}}
+   Where ''TOPOLOGY'' is the set of nodes you wish to image , and !IMAGENAME is the name of the image you with to load. The most common sandbox starting image command would look like
+   {{{
+   username@console.sb1:~$ omf load -t all -i baseline.ndz
+   }}}
+   which will load all the nodes of sandbox 1 (totaling 1) with the [wiki:Documentation/SupportedImages baseline] image.
+. The process start should look like:
+. Prior to the experiment, users need to install an image on the hard disks of the nodes. If you have not created a custom image use the default starting image:
+    '''baseline.ndz'''. This image is built on top of '''Ubuntu 12.04''', and is pre-configured with the proper modules and start up scripts to take advantage of the rest of
+    the Orbit services / hardware.  Loading an image is done with the [wiki:/Software/bAM#AggregateManagers omf load command].
+    {{{
+    username@console.domain:~$ omf load -t TOPOLOGY -i IMAGENAME
+    }}}
+    Where ''TOPOLOGY'' is the set of nodes you wish to image , and !IMAGENAME is the name of the image you with to load. The most common sandbox starting image command
+    would look like
+    {{{
+    username@console.domain:~$ omf load -t all -i baseline.ndz
+    }}}
+    which will load all the nodes of sandbox 1 (totaling 1) with the [wiki:Documentation/SupportedImages baseline] image. An example run on sandbox 7 looks like:
+    {{{
+user@console.sb7:~$ omf load -t all -i baseline.ndz
+   [[Image(newhowto2.jpg)]]
+. A key line to look for is ''INFO whenAll: *: 'status[@value='UP']' fires''. This line indicates that all the nodes have come up and imaging has begun:
+   [[Image(newhowto3.jpg)]]
+ INFO NodeHandler: OMF Experiment Controller 5.4 (git c005675)
+ INFO NodeHandler: Slice ID: pxe_slice
+ INFO NodeHandler: Experiment ID: pxe_slice-2013-01-16t14.56.02-05.00
+ INFO NodeHandler: Message authentication is disabled
+ INFO Experiment: load system:exp:stdlib
+ INFO property.resetDelay: resetDelay = 230 (Fixnum)
+ INFO property.resetTries: resetTries = 1 (Fixnum)
+ INFO Experiment: load system:exp:eventlib
+ INFO Experiment: load system:exp:imageNode
+ INFO property.nodes: nodes = "system:topo:all" (String)
+ INFO property.image: image = "baseline.ndz" (String)
+ INFO property.domain: domain = "sb7.orbit-lab.org" (String)
+ INFO property.outpath: outpath = "/tmp" (String)
+ INFO property.outprefix: outprefix = "pxe_slice-2013-01-16t14.56.02-05.00" (String)
+ INFO property.timeout: timeout = 800 (Fixnum)
+ INFO property.resize: resize = nil (NilClass)
+ INFO Topology: Loading topology 'system:topo:all'.
+ INFO Experiment: Resetting resources
+ INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: node1-2.sb7.orbit-lab.org,node1-1.sb7.orbit-lab.org) [0 sec.]
+ INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: node1-2.sb7.orbit-lab.org,node1-1.sb7.orbit-lab.org) [10 sec.]
+ INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: node1-2.sb7.orbit-lab.org,node1-1.sb7.orbit-lab.org) [20 sec.]
+ INFO stdlib: Waiting for nodes (Up/Down/Total): 0/2/2 - (still down: node1-2.sb7.orbit-lab.org,node1-1.sb7.orbit-lab.org) [30 sec.]
+ INFO ALL_UP: Event triggered. Starting the associated tasks.
+ INFO exp: Progress(0/0/2): 0/0/0 min(node1-2.sb7.orbit-lab.org)/avg/max (30) - Timeout: 760 sec.
+ INFO exp: Progress(0/0/2): 10/10/10 min(node1-2.sb7.orbit-lab.org)/avg/max (30) - Timeout: 750 sec.
+ INFO exp: Progress(0/0/2): 10/15/20 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 740 sec.
+ INFO exp: Progress(0/0/2): 20/25/30 min(node1-2.sb7.orbit-lab.org)/avg/max (30) - Timeout: 730 sec.
+ INFO exp: Progress(0/0/2): 30/35/40 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 720 sec.
+ INFO exp: Progress(0/0/2): 40/40/40 min(node1-2.sb7.orbit-lab.org)/avg/max (30) - Timeout: 710 sec.
+ INFO exp: Progress(0/0/2): 40/45/50 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 700 sec.
+ INFO exp: Progress(0/0/2): 50/55/60 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 690 sec.
+ INFO exp: Progress(0/0/2): 60/65/70 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 680 sec.
+ INFO exp: Progress(0/0/2): 60/65/70 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 670 sec.
+ INFO exp: Progress(0/0/2): 70/75/80 min(node1-2.sb7.orbit-lab.org)/avg/max (30) - Timeout: 660 sec.
+ INFO exp: Progress(0/0/2): 90/90/90 min(node1-2.sb7.orbit-lab.org)/avg/max (30) - Timeout: 650 sec.
+ INFO exp: Progress(1/0/2): 90/95/100 min(node1-1.sb7.orbit-lab.org)/avg/max (30) - Timeout: 640 sec.
+ INFO exp: Progress(2/0/2): 100/100/100 min()/avg/max (30) - Timeout: 630 sec.
+ INFO exp:  -----------------------------
+ INFO exp:  Imaging Process Done
+ INFO exp:  2 nodes successfully imaged - Topology saved in '/tmp/pxe_slice-2013-01-16t14.56.02-05.00-topo-success.rb'
+ INFO exp:  -----------------------------
+ INFO EXPERIMENT_DONE: Event triggered. Starting the associated tasks.
+ INFO NodeHandler:
+ INFO NodeHandler: Shutting down experiment, please wait...
+ INFO NodeHandler:
+ INFO NodeHandler: Shutdown flag is set - Turning Off the resources
+ INFO run: Experiment pxe_slice-2013-01-16t14.56.02-05.00 finished after 3:13
+    }}}
+. The final result should look like:
+   [[Image(newhowto4.jpg)]]
+. At this point the nodes are imaged with the ''basline'' image and need to be turned back on before proceeding.
+   {{{
+   username@console.sb2:~$ omf-5.2 tell on all
+   }}}
+   Give the nodes a couple of minutes to turn on. To check the status of the node:
+   {{{
+   username@console.sb2:~$ omf-5.2 stat
+   }}}
+. By default the driver modules for the wireless interfaces are disabled. It is up to the experimenter to decide which interface to use. For this tutorial experiment the ath_pci module will be used. So before running the tutorial experiment ssh into each node (ie. node1-1 & node 1-2) and load the driver modules.
+   {{{
+   username@console.sb2:~$ ssh root@node1-1
+   }}}
+   From the node load the driver module:
+   {{{
+   root@node1-1:~# modprobe ath_pci
+   }}}
+   Verify that the module has been loaded into the kernel
+   {{{
+   root@node1-1:~# lsmod
+   }}}
+   [[Image(newhowto_lsmod.jpg)]]
+   Now ssh into ''node1-2'' and do the same.
+. To run a tutorial experiment that involves one UDP traffic sender and one receiver, run the following command at the console.
+   {{{
+   username@console.sb2:~$ omf-5.2 exec --tutorial -- --tutorialName tutorial-1a
+   }}}
+   Make note of the unique experiment ID as shown in the experiment output below. This ID can be used later to view the results from a database (sqlite3) file.
+   [[Image(newhowto5.jpg)]]
+. Both, sender and receiver, report measurements to a database, using the OML measurement framework. The file is saved as a sqlite3 file; the file name for the experiment is shown in the last line of the tutorial's output and saved in the console under /var/lib/oml2
+To dump the database file for this experiment:
+   {{{
+    username@console.sb2:~$ sqlite3 /var/lib/oml2/sb8.orbit-lab.org_2011_07_12_16_00_33.sq3 ".dump"
+   }}}
+The experiment can be started with:
+{{{
+user#> nodehandler -t
+}}}
+[[Image(howto4.PNG)]]
+ * This experiment will send UDP datagrams of 1024 bytes from node 1-1 to node 1-2 at 300 kbps CBR traffic.
+ * Both, sender and receiver, report measurements to a database, using our [wiki:Documentation/OML OML] measurement framework.
+ * As shown below, the experiment controller will power on the nodes involved in the experiment and will issue experiment commands to each node.
+ * Each experiment has a unique experiment ID as shown in the figure, that can be used later to view the results from the database
+[[Image(howto5.PNG)]]
+Alternatively, a specific script can be run as follows:
+The experiment can be started with:
+{{{
+user#> nodehandler <full-path/script-name>
+}}}
+For e.g., if my script is called orbit-test.rb and it resides in /home/joenull/Ruby-Scripts/ (ORBIT home directory), I would execute it as follows:
+{{{
+user#>pwd
+/home/joenull
+user#>nodehandler ~/Ruby-Scripts/orbit-test
+}}}
+Note that I leave out the ".rb" at the end. This will execute the scripts and turn the nodes OFF at the end of the experiment. If you want to leave them ON after the experiment, use the "-k" flag. For e.g.
+{{{
+user#>pwd
+/home/joenull
+user#>nodehandler -k ~/Ruby-Scripts/orbit-test
+}}}
+The experimenter can also move to where the script resides and execute it (without giving the full path) since nodehandler will look for the script in the current directory.
+More information on writing experiment scripts can be found in the [wiki:Tutorial Tutorial].
+== Analyzing Results ==
+Orbit provides a sophisticated framework to efficiently collect measurements at runtime into a database. This database is accessible to the experimenter during the experiment from the console. At the end of an experiment, the database is copied to an external machine and is accessible without a reservation. More information can be found [wiki:Tutorial/AnalyzeResults here].
+. The imageing process will turn the nodes back off after completing imageing. At this point the nodes disks are imaged with the ''basline'' image
+    and need to be turned back on before proceeding.
+    {{{
+    username@console.domain:~$ omf tell -a on -t all
+    }}}
+    Give the nodes a couple of minutes to turn on / boot, then check their status with omf stat.
 = Where to go from here =