wiki:Internal/NewNodeAgent

Version 18 (modified by ssugrim, 10 years ago) ( diff )

Rebuilding the Node Agent from the ground up

Dependencies:

  • omfcommon/Mobject

gems:


Design Notes:

Lib:kmodule.rb
Class:kmodule
Interface:

  • new (private)
  • instance - give you and instance of the object with checks to make sure the module name is unique
  • mload(private) - loads the module if needed and registers a reference to the class that requested the load
  • munload - unloads module is conditions are met, and de-registers the unload requester
  • loaded? - preforms an actual check to see if module is loaded

Moving all the functionality of module handling into a separate class. There may be a need to have drivers load multiple modules. Additionally the module class needs to keep track of how many devices are using one module (as there may be many to one). The unload method should only unload if the last reference asks it to (all other references have de-registered).

Lib:nodeagent.rb:
Class:nodeagent.rb:
Interface

  • hwDiscover -
  • instance
  • run

The hwdiscovery mode of the nodeagent code is being re-done. Instead of the original grep lspci lines, we now have has of dev_id's. Each dev_id in the hash maps to a node agent device driver that will be loaded to handle that specific device. This device will load the module for it's respective device type.

agentCommands.rb:
Interface

  • configure

device.rb
Interface

  • activate
  • deactivate
  • active?
  • getconfigCMD
  • configure
  • hasMod? *add_kmodName

We're implementing all the module handeling code here. It uses the cocaine command line library as a wrapper. The cocaine lib kicks out an exception if the command fails. Exception handling will mitigate failed commands.

ethernet.rb < device.rb
Interface

  • initialize()
  • claim_ctrl(call_ref)
  • self.set_control_if(iface="eth1")
  • self.get_control_if()
  • control_if?
  • set_ip(ip)
  • get_ip()
  • up()
  • down()
  • set_netmask(netmask)
  • set_mtu(mtu)
  • set_mac(mac)
  • get_mac()
  • set_arp(arp)
  • set_forwarding(forwarding)
  • set_gateway(gateway)
  • drop_config()
  • restart_dhcp()
  • execConfigCmd(prop, value=nil)
  • getCmdForRoute(value)
  • getCmdForFilter(value)
  • deactivate()

This class is responsible for all of the ifconfig/route handling code (all things ethernet). The same cocaine lib will handle shell commands with similar exception handeling. The interface has been extended to implement getters and setters for all the requisite interface state (ip, netmask, etc…). We've left the original get config command structure alone, but now also have implemented an execConfigCmd which will actually implement these config commands.

wifi.rb < ethernet.rb
Interface

  • initialize()
  • check_time
  • connected?()
  • connect_simple(ssid)
  • connect_adhoc
  • disconnect_simple
  • disconnect_adhoc
  • disconnect()
  • checkStatus(retries = true)
  • scan()

rcDrv_e1000e.rb < ethernet.rb
Interface

  • new

e1000e kernel module specif driver. A child of ethernet. Overides any thing that is specific to e1000e, but mostly leaves the parent class in tact. Also sets the value of @kmodName

rcdrv_r8169.rb < ethernet.rb
Interface

  • new

rcdrv_skge.rb < ethernet.rb
Interface

  • new

rcdrv_ath5k.rb < wifi.rb
Interface

  • new

rcdrv_ath9k.rb < wifi.rb
Interface

  • new

rcdrv_ipw2200.rb < wifi.rb
Interface

  • new

rcdrv_iwlwifi.rb < wifi.rb
Interface

  • new

rcdrv_wl.rb < wifi.rb
Interface

  • new

Driver Inheretance Chain

Updated nodeagent UML for wimax


Observations:


9/1/2014

  • active? is defined at "Ready to be configured". If a device is set to active, it is read to accept config commands.
  • Activation should check if the module is loaded before loading it.
  • All devices will not know their device name during creation (default is nil). All dev_driver classes will be instantiated at startup, however none of them will be activated (with the exception of eth1). When a device need to be used. It will have to be activated. Once activated (that is once the module is loaded) the agent will give names to the respective objects now that they have been created. This is essentially how the logical names are bound to the dev names (a logic managed by the agent, not the dev).

9/22/2014

  • It may be necessary to have multiple driver modules for a single device. The recurrent case is the iwldvm module. It gets loaded if we load iwlwifi but it depends on iwlwifi. Thus when you load iwlwifi, you can not unload it with out first unloading iwldvm. Because iwlwifi and athXk both depend on cfg802, when the iwldvm module is unloaded with modprobe -r iwldvm, iwlwifi is not unloaded.

9/23/2014

  • Node agent it self should check what modules are already loaded and create instances of the module classes for modules that are already loaded. It will push in a refrence to it self, so that the refrence buffer can never go empty.
  • There will need to be a structure that maps pci-id ↔ modulename ↔ deviceClass. This structure will need to be known to node agent.

10/30/2014

I've run into a chicken and egg problem where I need to know the device name before I can make the object but can't know the device name until after the object is made. I think I'm going to resolve this by making the logical names mutable.


11/11/2014 Now trying to integrate the driver and agent code. We'll basically be modifying the behavior of the agentCommands.configure method. This used to construct command strings that would be run with minimal error checking. Now this will map the request strings to a specific method call. This method should have it's own error handling and raise exceptions when something doesn't work properly. These exceptions can be caught at the top level to prevent crashes.

In http://www.orbit-lab.org/browser/omf/omf-expctl/ruby/omf-expctl/node/nodeSetPath.rb the set of possible properties that can be set is listed in an array. The original agent command code is here: http://www.orbit-lab.org/browser/omf/omf-resctl/ruby/omf-resctl/omf_agent/agentCommands.rb


12/15/2014

We ran into a race condition of sorts where commands may come in after the connect call was issued. We've decided that the "start_application" directive is the decision point where we've given all the parameters we expect to get. To this end in the node-agent maintains a parameter model of the connection to be made. As parameters come in the node agent updates these parameters from the default. When the "start_application" directive is received, the agent will initiate a connect command. At this point the applicationStarted flag will = true. Once this flag is set, any parameter changes will be applied immediately. Parameters must be split into two types. Those that can be applied immedately (like freq) and those that require a reconnect like mode ibss). Once the applicationStarted flag is set if a reconnect is required when a parameter is changed it is do immedately.

As a side note the different mode connect methods should check if the device is already connected and if so do nothing (regardless of which mode it is connected in).


12/24/2014

The agent commands second argument is a object of type OmfPubSubMessage. The source of this class is located here. The path method produces the output we'll be switching on. The configure command calls the respective device_driver's configure command and expects a hash output. In this configure command is where the connect state needs to be implemented and maintained.


1/16/2014

We've decided that the drivers will store config state. We've also said that there must be an explict call to connect. In the ethernet context connect means bring the interface up. In the wifi context it will mean something different.

To that end I've implemented a config method in the ethernet that stores what configs were pushed in. I've also overridden the setter for device name. The new devicename method checks if the kmod as loaded before allwoing name assignment. If the kmod is loaded and name assignment succeeds, the connected state is checked (via ifconfig) and recorded, as well as the hardware mac.

When a call to configure is made the connection state is checked. If it is connected the configure call tries to apply the change immedately, if it is not connected the configure call merely updates the hash.

When a call to connect is made, the config hash is checked for values and if any exist the approriate configure call is made. The set of possible props is stored in a frozen has along with a pointer to the function that will handle that prop. Configure it self takes 2 string arguments, the first is the "prop name" to be configured, and the second is a value string. e.g. ("ip","192.168.1.1."). If a prop name hit occurs in the hash, the mapped function is called and the value feed as an argument to that function.

A similar design is going to be worked into the WIFI class. The expected behavior of the configure function is that if a prop name is not found by the current intonation of the configure function, a call to super is made to see if the superclass configure function can handle it. Eventually if you get all the way to up the device super class function, it raises an exception.

The prop names handled by Device are:

  • None - Raises Exception

The prop names handled by Ethernet are:

  • ip
  • mtu
  • mac
  • arp
  • forwarding
  • gateway

The prop names for WiFi are:

  • mode
  • rate
  • essid
  • channel

2/6/2015

Node agent will have to enforce an order for what modules are inserted unless it is specified. It should maintain an ordered array of dev-ids.


2/9/2015

We've finally agreed that device names should be discovered when the device is activated. The only logical pairing is between logical name and module name. Unless explicitly specified the logical name to module_name mapping will follow a specific ordering. For Ethernet it is

[e1000e, r8169] with the special exception that e1 is never touched (and enabled by default).

For example, in a node with 3 Ethernet interfaces, 2 x e1000e interfaces and 1 x r8169, with e1 already bound to one of the e1000e interfaces. e0 should be the e1000e interfaces and e1 the r8169. This should be the cause reguardless of the devicename bindings.

For wifi the ordering is: [ath5k, ath9k, ipw2200, iwlwifim wl]. So with a node that has a ath5k and ipw2200, w0 maps to the ath5k, and the w1 ipw2200.

Additionally the behavior of the configure command needs to be changed. It should store commands if not activated or connected, and directly act on them only if activated and connected. The state of activation of device name population shouldn't matter to the configure command.

Additionally we should start using the Timeout facility of Ruby to ensure that the node agent doesn't hang or block. refrence here here


2/10/2015

We're going to make a new child of ethernet named wired. It will be activated in the reverse of ethernet. It will be told the logical name and the device name, and discover which kernel modules map to it (but this part may not be necissary).

The assumption is that all wifi devices have their modules blacklisted, but all wired devices are supposed to have thier modules loaded on boot.

Attachments (9)

Download all attachments as: .zip

Note: See TracWiki for help on using the wiki.