wiki:Internal/NewNodeAgent

Rebuilding the Node Agent from the ground up

Dependencies:

  • omfcommon/Mobject

gems:

  • cocaine - refrence here
  • yaml?
  • timeout

Package dependencies

  • ethtool
  • lshw

Current Images

  • Stable:
    baseline-ubuntu-12-04-32bit-newResCtl-5-13-2015.ndz
    
  • Testing (wimax featurs):
    ssugrim-node-node1-1.sb4.orbit-lab.org-2015-05-16-23-05-33.ndz
    

TODOS:

  • Agent Commands should have a scan method that reports back to the controller. It should call the respective scan method of the driver.
  • For completeness we can move the scan method functionality down the the device.rb level. At that context it will raise an exception.
  • In the Ethernet context, it will return a string that says "Why are you trying to scan with ethernet?"
  • What other configure directives should wimiax/LTE repspond to beside network name.
  • Test Wimax features
  • Test other tutorials against the base function
  • Return IP address in the result hash of start. For devices that autodhcp when started (e.g. wimax)
  • LTE support - requires pushing commands into serial to start a connection.

Design Notes:

File: kmodule.rb
Class: kmodule
Interface:

  • public methods:
    • new (private)
    • self.lsmod - output of lsmod
    • self.instance - give you and instance of the object with checks to make sure the module name is unique
    • self.set_interval - sleep interval between modprobe commands
    • mload(private) - loads the module if needed and registers a reference to the class that requested the load
    • munload - unloads module is conditions are met, and de-registers the unload requester
    • loaded? - preforms an actual check to see if module is loaded
  • readable instance vars:
    • kmodName

Moving all the functionality of module handling into a separate class. There may be a need to have drivers load multiple modules. Additionally the module class needs to keep track of how many devices are using one module (as there may be many to one). The unload method should only unload if the last reference asks it to (all other references have de-registered).

File: nodeagent.rb
Class: nodeagent
Interface:

  • public methods:
    • hwDiscover - uses lspci,lsusb, and lshw to discover what hardware is installed. It also checks what modules are preloaded and creates objects to manage them.
    • instance
    • run

The hwdiscovery mode of the nodeagent code is being re-done. Instead of the original grep lspci lines, we now have has of dev_id's. Each dev_id in the hash maps to a node agent device driver that will be loaded to handle that specific device. This device will load the module for it's respective device type.

File: agentCommands.rb
Module: agentCommands
Interface:

  • public methods:
    • CONFIGURE
    • BRINGUP

The BRINGUP command is called after all the initial configs are sent, but before the experiment is started. This instructs the agent to attempt to apply all the configs and bring up / connect interfaces as necissary.

File: device.rb
Class: Device
Interface:

  • public methods:
    • initialize(logName = nil) - can be used to set logicalName
    • activate - load all modules (via the kmodule.load) and other basic setup to make a device ready to be configure
    • deactivate - deconfigure device and remove module
    • configure - Nothing to configure at this context, raise exception
    • hasMod?
    • add_kmodName - adds a kernel module name to list of modules for this device
    • start - Nothing to start at this context, raise exception
    • stop - Nothing to stop at this context, raise exception
    • apply_conf(prop, value) - No config to apply at this context, raise exception
  • readable instance vars:
    • isActive - Desired state Var
    • isStarted - Desired state Var
    • kmodNames
    • media
  • accesible instance vars:
    • logicalName ← can be set via constructor or by direct assignment
    • deviceName ← Maybe this should be just readable
    • configHash ← Maybe this should be just readable

The device class is the prototype for all devices. It should not be instantiated, instead it should be extended to child classes which will have proper context to actually start the device and apply configs.

File: ethernet.rb
Class: EthernetDevice < Device
Interface:

  • public methods:
    • initialize()
    • claim_ctrl(call_ref)
    • set_altKmodName(kmodName) - used to map devname to module/logical name if "driver=" string in lshw does not match original in @kmodNames
    • self.set_control_if(iface="eth1") - Set global variable @@controlif which holds the device name of the control if
    • self.get_control_if() - read @@controlif
    • control_if? - is this device object the control if?
    • activate(logicalNumber=nil) - logical number is the numeric part of the logical name, if possible we bind to the deivce name with the same logical number
    • deactivate()
    • deviceName=(d) - Some additional steps/checks may be required to set device name
    • start() - Override the parent start method to establish what start means in the Ethernet context.
    • stop()
    • up(devName=@deviceName) - optional devicename argument is for the case when multiple logical interfaces are handled by a single driver (e.g. wifi monitor interfaces).
    • down(devName=@deviceName)
    • set_ip(ip)
    • set_netmask(netmask)
    • set_mtu(mtu)
    • set_mac(mac)
    • set_arp(arp)
    • set_forwarding(forwarding)
    • set_gateway(gateway)
    • drop_config() - not used, can be used to recover interface (will reestablish address via dhcp)
    • restart_dhcp() - not used, part of drop_config
    • apply_conf(prop, value) - checks if prop is from this context. If it is, calls appropriate method, otherwise, calls parent's apply_conf with same args
    • configure(prop, value) - check current state (@isActive, @isUP, @isStarted) and apply config. Live if the device was started (may require starting interface).
    • get_eth_status(devName) - live check of the Ethernet state of devName, this is independent of @configHash which stores the "desired" state.
    • get_gateway
    • get_hw_mac(devName) - retrieve the original hardware mac (read from ethtool -p)
    • get_up?(devName) - Live check if devName is up (independent of @isUP the desired state)
  • readable instance vars:
    • isUp - Desired state Var

This class is responsible for all of the ifconfig/route handling code (all things ethernet). The same cocaine lib will handle shell commands with similar exception handeling. The interface has been extended to implement setters for all the requisite interface state (ip, netmask, etc…). The usage scenario is expected to flow as follows:

  1. include file and instantiate object (will probably instantiate one of the hardware specific children instead of EthernetDevice class proper).
  2. Set Logical Name (via constructor or assignment)
  3. Activate via object.activate
  4. configure requisite parameters via multiple calls to object.configure("prop","value"), this will store them
  5. apply configs and bring up interfaces via object.start
  6. device should now be ready to used for traffic experiments

File: wifi.rb
Class: WifiDevice < EthernetDevice
Interface:

  • public methods:
    • initialize(logName = nil)
    • activate(logicalNumber = nil)
    • get_ranges() - determine what the allowable ranges are for a given card. (collected but not used)
    • get_current_mode(devName) - live mode check, independent of @configHahsmode (e.g. managed, ibss).
    • get_wifi_status(devName, att = @stat_retry) - live status check.
    • get_connected?(att = 1, refresh = 5) - live connection check.
    • connect_adhoc(ssid, freq) - attempt to connect in ibss mode - will raise exceptions if connection fails
    • disconnect_adhoc() - attempt ibss leave
    • connect_simple(ssid, freq=nil) - attempt to connect to an open managed AP (no WEP/WPA)
    • disconnect_simple() - disconnect form managed AP
    • start()
    • stop()
    • build_monitor() - create a monitor interface for the given device
    • del_monitor() - take down monitor interface
    • set_freq(freq = @configHashfreq)
    • scan() - scan channel sets to learn hearable AP's / adhoc clients
    • set_mode(mode) - switch between IBSS (adhoc) and managed mode
    • set_essid(essid)
    • set_channel(chan) - logical mapping between channel numbers and frequency (really just a wrapper around setting a specific frequency)
    • set_rate(rateStr) - takes a string argument of allowable rates (can't be a single item list)
    • set_type(type) - Wrapper around rate str that translates a/b/g to rate strings
    • apply_conf(prop, value) - same concept as the Ethernet. If prop is in the wifi context set it, otherwise pass it down
    • configure(prop, value) - Similar to the Ethernet case. Some settings (e.g. frequency) can be set live. If the can just apply them immediately. Others require a reconnect (e.g. mode) if the interface is already started, this method handles that problem.
    • deactivate()
  • accesible instance vars:
    • action_interval
    • stat_retry
    • connSleep
  • readable instance vars:
    • media
    • freq_range
    • bitrate_range
    • monIf
    • isConnected - Desired state Var

File: wimax.rb
Class: WimaxDevice < EthernetDevice
Interface:

  • public methods:
    • initialize(logName = nil)
    • activate(logicalNumber = nil) - in addition to loading the module this checks to make sure wimaxd was started.
    • start()
    • stop()
    • connect()
    • disconnect()
    • scan() - "wimaxcu scan"
    • get_connected?() - uses "wimaxcu info stats" as a check of connectivity. If the interface is connected the stats call will return some values, if it is disconnected it will terminate in error (which will raise a handled exception).
    • info() - returns a string that is the result of "wimaxcu info"
    • configure(prop, value) - same function wifi
    • get_mac() - uses the get_eth_status method to get the current mac
    • current_ip() - uses the get_eth_status method to get the ip for the current device
  • readable instance vars:
    • media
    • isConnected - Desired state Var

File: rcDrv_e1000e.rb
Class: RcDrv_e1000e < EthernetDevice
Interface:

  • public methods:
    • initialize(logName = nil) - sets kmod name(s)

e1000e kernel module specifc driver. A child of ethernet. Overides any thing that is specific to e1000e, but mostly leaves the parent class in tact. Also sets the value of @kmodName

File: rcDrv_e1000e.rb
Class: RcDrv_e1000e < EthernetDevice
Interface:

  • public methods:
    • initialize(logName = nil) - sets kmod name(s)

File: rcdrv_r8169.rb
Class: Rcdrv_r8169.rb < EthernetDevice
Interface:

  • public methods:
    • initialize(logName = nil) - sets kmod name(s)

File: rcdrv_skge.rb
Class: Rcdrv_skge.rb < EthernetDevice
Interface:

  • public methods:
    • initialize(logName = nil) - sets kmod name(s)

File: rcdrv_ath5k.rb
Class: Rcdrv_ath5k.rb < WifiDevice
Interface:

  • public methods:
    • initialize(logName = nil) - sets kmod name(s)

File: rcdrv_ath9k.rb
Class: Rcdrv_ath9k.rb < WifiDevice
Interface:

  • public methods:
    • initialize(logName = nil) - sets kmod name(s)

File: rcdrv_ipw2200.rb
Class: Rcdrv_ipw2200.rb < WifiDevice
Interface:

  • public methods:
    • initialize(logName = nil) - sets kmod name(s)

File: rcdrv_wl.rb
Class: Rcdrv_wl < WifiDevice
Interface:

  • public methods:
    • initialize(logName = nil) - sets kmod name(s)

File: rcdrv_iwlwifi.rb
Class: Rcdrv_iwlwifi < WifiDevice
Interface:

  • public methods:
    • initialize(logName = nil)
    • activate(logicalNumber = nil) - additionally checks if iwldvm module was loaded.
    • deactivate() - will remove the iwldvm module before calling super if needed.

File: cdrv_i2400m_usb.rb
Class: Rcdrv_i2400m_usb < WimaxDevice
Interface:

  • public methods:
    • initialize(logName = nil)
    • activate(logicalNumber = nil) - additionally checks if iwldvm module was loaded.
    • deactivate() - will remove the iwldvm module before calling super if needed.

The usb wimax driver. Same concept as all the others, mostly just specifies the kernel module.

Driver Inheretance Chain

Updated nodeagent UML for wimax


Observations:


9/1/2014

  • active? is defined at "Ready to be configured". If a device is set to active, it is read to accept config commands.
  • Activation should check if the module is loaded before loading it.
  • All devices will not know their device name during creation (default is nil). All dev_driver classes will be instantiated at startup, however none of them will be activated (with the exception of eth1). When a device need to be used. It will have to be activated. Once activated (that is once the module is loaded) the agent will give names to the respective objects now that they have been created. This is essentially how the logical names are bound to the dev names (a logic managed by the agent, not the dev).

9/22/2014

  • It may be necessary to have multiple driver modules for a single device. The recurrent case is the iwldvm module. It gets loaded if we load iwlwifi but it depends on iwlwifi. Thus when you load iwlwifi, you can not unload it with out first unloading iwldvm. Because iwlwifi and athXk both depend on cfg802, when the iwldvm module is unloaded with modprobe -r iwldvm, iwlwifi is not unloaded.

9/23/2014

  • Node agent it self should check what modules are already loaded and create instances of the module classes for modules that are already loaded. It will push in a refrence to it self, so that the refrence buffer can never go empty.
  • There will need to be a structure that maps pci-id ↔ modulename ↔ deviceClass. This structure will need to be known to node agent.

10/30/2014

I've run into a chicken and egg problem where I need to know the device name before I can make the object but can't know the device name until after the object is made. I think I'm going to resolve this by making the logical names mutable.


11/11/2014 Now trying to integrate the driver and agent code. We'll basically be modifying the behavior of the agentCommands.configure method. This used to construct command strings that would be run with minimal error checking. Now this will map the request strings to a specific method call. This method should have it's own error handling and raise exceptions when something doesn't work properly. These exceptions can be caught at the top level to prevent crashes.

In http://www.orbit-lab.org/browser/omf/omf-expctl/ruby/omf-expctl/node/nodeSetPath.rb the set of possible properties that can be set is listed in an array. The original agent command code is here: http://www.orbit-lab.org/browser/omf/omf-resctl/ruby/omf-resctl/omf_agent/agentCommands.rb


12/15/2014

We ran into a race condition of sorts where commands may come in after the connect call was issued. We've decided that the "start_application" directive is the decision point where we've given all the parameters we expect to get. To this end in the node-agent maintains a parameter model of the connection to be made. As parameters come in the node agent updates these parameters from the default. When the "start_application" directive is received, the agent will initiate a connect command. At this point the applicationStarted flag will = true. Once this flag is set, any parameter changes will be applied immediately. Parameters must be split into two types. Those that can be applied immedately (like freq) and those that require a reconnect like mode ibss). Once the applicationStarted flag is set if a reconnect is required when a parameter is changed it is do immedately.

As a side note the different mode connect methods should check if the device is already connected and if so do nothing (regardless of which mode it is connected in).


12/24/2014

The agent commands second argument is a object of type OmfPubSubMessage. The source of this class is located here. The path method produces the output we'll be switching on. The configure command calls the respective device_driver's configure command and expects a hash output. In this configure command is where the connect state needs to be implemented and maintained.


1/16/2014

We've decided that the drivers will store config state. We've also said that there must be an explict call to connect. In the ethernet context connect means bring the interface up. In the wifi context it will mean something different.

To that end I've implemented a config method in the ethernet that stores what configs were pushed in. I've also overridden the setter for device name. The new devicename method checks if the kmod as loaded before allwoing name assignment. If the kmod is loaded and name assignment succeeds, the connected state is checked (via ifconfig) and recorded, as well as the hardware mac.

When a call to configure is made the connection state is checked. If it is connected the configure call tries to apply the change immedately, if it is not connected the configure call merely updates the hash.

When a call to connect is made, the config hash is checked for values and if any exist the approriate configure call is made. The set of possible props is stored in a frozen has along with a pointer to the function that will handle that prop. Configure it self takes 2 string arguments, the first is the "prop name" to be configured, and the second is a value string. e.g. ("ip","192.168.1.1."). If a prop name hit occurs in the hash, the mapped function is called and the value feed as an argument to that function.

A similar design is going to be worked into the WIFI class. The expected behavior of the configure function is that if a prop name is not found by the current intonation of the configure function, a call to super is made to see if the superclass configure function can handle it. Eventually if you get all the way to up the device super class function, it raises an exception.

The prop names handled by Device are:

  • None - Raises Exception

The prop names handled by Ethernet are:

  • ip
  • mtu
  • mac
  • arp
  • forwarding
  • gateway

The prop names for WiFi are:

  • mode
  • rate
  • essid
  • channel

2/6/2015

Node agent will have to enforce an order for what modules are inserted unless it is specified. It should maintain an ordered array of dev-ids.


2/9/2015

We've finally agreed that device names should be discovered when the device is activated. The only logical pairing is between logical name and module name. Unless explicitly specified the logical name to module_name mapping will follow a specific ordering. For Ethernet it is

[e1000e, r8169] with the special exception that e1 is never touched (and enabled by default).

For example, in a node with 3 Ethernet interfaces, 2 x e1000e interfaces and 1 x r8169, with e1 already bound to one of the e1000e interfaces. e0 should be the e1000e interfaces and e1 the r8169. This should be the cause reguardless of the devicename bindings.

For wifi the ordering is: [ath5k, ath9k, ipw2200, iwlwifim wl]. So with a node that has a ath5k and ipw2200, w0 maps to the ath5k, and the w1 ipw2200.

Additionally the behavior of the configure command needs to be changed. It should store commands if not activated or connected, and directly act on them only if activated and connected. The state of activation of device name population shouldn't matter to the configure command.

Additionally we should start using the Timeout facility of Ruby to ensure that the node agent doesn't hang or block. refrence here here


2/10/2015

We're going to make a new child of ethernet named wired. It will be activated in the reverse of ethernet. It will be told the logical name and the device name, and discover which kernel modules map to it (but this part may not be necissary).

The assumption is that all wifi devices have their modules blacklisted, but all wired devices are supposed to have thier modules loaded on boot.


2/12/2015

added UML diagram of driver class layout. Followed UML guidelines from here and here. Note Diamond = composotion, and arrow = parent.

wired Class mostly implemented. A few notable large code changes:

  • Most of the checks to @devName were switched to checks against @isActive since you can't (or shouldn't) be able to activate with out knowing your devName. Check to activate is more proper since it means ready to be configured instead of I know my name (but those are almost equivalent conditions). The noteable accpetion is of course the new wired class where the name is always known, but a call to activate (for other checks and population of config hash) has not been made.
  • All calls to @@some_line.run have been wrapped into a Timeout::timeout(@cmdTimeout){} block. If @cmdTimeout number of seconds passes, the block emits a Timeout::Error exception. I then catch this expception and re-raise it with some context info (the command I was running). Ultimately this exception will be caught in Agent commands and used as info for a "command failed" message that is sent back console.

It should be noted that AgentCommands::configure should catch all exceptions and log them (or at least like standard error, but something pretty wide). This way the nodeagent never dies. And since every command should now timeout, the agent should never lock up (for ever).

This is the current flow of the HW discover method in the nodeagent.rb code:


2/19/2015

Tow major discrepencaies were discovered when during the last expirement run.

  1. The return type of configure needs to be a hash with a specific set of keys
  2. Configure should be callable on inactivated devices.

The first is trivial. It simply collect the result of the cocaine call and wraps it into the respective hash that the calling function expects. The return structure is seen in the previous version of the code, which is located here.

The second however is some what more complicated. The basic idea is that public facing action methods will make a call to activate. Public facing non-action methods don't need to worry about activation. All private action methods need to check for activation, but should not be able to explictly call activate (that state control is the job of the public faceing action methods).

Current public facing wifi methods:

  • Active:
    • scan
    • connected?
    • connect
    • activate
  • Passive:
    • configure

Current public facing Ethernet methods:

  • Active:
    • connected?
    • connect
    • activate
  • Passive:
    • configure

2/23/2015

There is a logical loop that could potentially happen. The issue is that settings have to be applied in order. If several settings are stored before the call to connect goes out, there may be a loop introduced by calling configure from connect. The solution is build a new function called by both configure and connect called apply_config. It will check the config and impose and ordering on the properties (stored in an ordered has).

Configure should have the following logic:

if connected:

disconnect connect

Where connect does:

apply_config connect

apply_config should maintain an the order of the settings. (maybe connect can / should do this)


2/24/2015

This is the expected flow of the calls to configure or connect. Note the matching colors means they functionality belongs to the same method. Connect Configure Flow


3/6/2015

Ivan added a bring up command (similar but independent to the configure command). The configure command. To make this work on the res-ctlr side we needed to add a BRINGUP method to agent commands and add bring up to the list allowed commands. The latter bit was done in: this code.


3/10/2015:

we've packaged up the current version of the tool. It seems to work in the native environment but once moved to the 14.04 image, I'm having some dependency problems.

Note this is the list of gems installed:

activesupport (4.2.0)
climate_control (0.0.3)
cocaine (0.5.4)
i18n (0.7.0)
json (1.8.2)
minitest (5.5.1)
oml4r (2.10.6)
thread_safe (0.3.4)
tzinfo (1.2.2)

Minus oml4r all the other stuff is dependencies of cocaine 0.5.4.

There was issue with the include path, the file /usr/sbin/omf-resctl-5.4 needed to be modified. I'll need to put this up to the repo. This was the change:

GEM_PATH=/usr/share/omf-common-$VER/gems exec ruby1.9.1 -I$PDIR -I$PDIR/omf-resctl/omf_agent -I$PDIR/omf-resctl/omf_driver -I/usr/share/omf-common-$VER $PDIR/$APP $*

3/16/2015 Discovered a bug in the code where the meaning of connected between parent (ethernet) and child (wifi) classes was clobbering the up down capability. Have to refactor the interfaces to refelect this diagram
Flow of node Agent startup


3/17/2015

  • rcdrv_ipw2200: For now we'll have to set the connect and configure methods of this device to raise an exception. It doesn't respond to most (but not all) iw commnads (e.g. set ssid, connect). This will have to be done via the old iwconfig commands so we'll have to evaluate the methods and flow of the framework.
  • monitor mode is done via the calls on this page :
    https://wireless.wiki.kernel.org/en/users/documentation/iw?s[]=monitor#adding_interfaces_with_iw
    
    If the call to set mode is to go to monitor, we make sure the interface is down and disconnected, we then use the call to make the moni interface and bring up the interface.
  • Try installing the current version of nodeagent onto the original baseline.ndz and see if it still works. (if so we can start with the development of the wimax tool).
  • Outstanding Modules to test are:
    • Broadcom STA
    • Intel AC

3/20/2015

The flow of the start command should follow this chart:
Start logic

The flow of the configure command should follow this chard:
The decision flow of the configure command


3/23/2015

We need to maintain 3 different state variables. They are defined as follows:

  • up: the interface has been brough up via ifconfig
  • connected: meaningful in the wificontext. A call to connect (adhoc or managed) has been made and was sucessfull
  • started: A call to started was made and the started variable set.

4/1/2015

There are 4 state variables that are meaningful in a hierarchical context. The state vairables are:

  • isActive - Is the modules loaded and read for configuration
  • isUp - Has the interface been brought up
  • isConnected - Has the interface been successfully connected (for interfaces that require an explict connect call)
  • isStarted - Has the expirment started (and the stored configs pushed onto the interface).

Each type of medium must track their own instances of these variables and they are only relevant in certain contexts:

  • Wired:
    • isUp
    • isActive
    • isStarted
  • Wifi:
    • isUp
    • isActive
    • isStarted
    • isConnected
  • Wimax:
    • isUp
    • isActive
    • isStarted
    • isConnected

It should be noted that each of the different states are only meaningful in different contexts:

  • isActive
    • Device Context
    • Depends on: Nothing
    • All devices should be activate-able (load module)
  • isStarted
    • Device Context
    • Depends on: isUP or isConnected
    • Depends on all other possible states (configure all parameters and ready for use)
  • isUp
    • Ethernet Context
    • Depends on: isActive
  • isConnected
    • Wifi or Wimax Context
    • Depends on: isUp

Also note that these are "desired" states not the actual state. Ideally they properly model the actual state.


5/7/2015

There is unhandled case with respect to module loading. When a module is loaded, several dependent modules are loaded. Some times this includes that need to be removed before, the module can it self be removed.

Each indivdual driver will have to know about this problem and handle it if needed. The specific case is iwldvm for iwlwifi. The logic here will be to load iwlwifi and then check if iwldvm is loaded. If it is, when a call to unload is made, we must unload iwldvm first.

This is handled by overriding the activate and deactivate methods. In activate, it calls the super() and then checks for the respective module. In deactivate it unloads the respective module and then calls super.

There may still be a degeneracy where calling 'modprobe -r iwldvm' removes the iwlwifi module, which may cause the kmodule state model to become inconsistent.


A little background on iwldvm. It's apparently a firmware compatibility module that provides an interface to iwlwifi. According to the comments on this doc, one can either run iwldvm or IWLMVM. Both are compatiblity layers but all the commands funnel through iwlwifi. That said then, as far as command capability with iw, as long as one of those modules is present (or baked into the kernel), the iw command feature set should be available.

Last modified 5 years ago Last modified on May 18, 2015, 7:43:34 PM

Attachments (9)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.