Changes between Initial Version and Version 1 of Old/NodeHandler/Broadcast


Ignore:
Timestamp:
Apr 10, 2006, 8:11:13 PM (18 years ago)
Author:
sswami
Comment:

Legend:

Unmodified
Added
Removed
Modified
  • Old/NodeHandler/Broadcast

    v1 v1  
     1[[TOC(Documentation/NodeHandler/Broadcast)]]
     2= Architecture Design =
     3
     4{{{
     5Saswati Swami (sswami@eden.rutgers.edu)
     6}}}
     7
     8== Introduction ==
     9The current NodeHandler code works satisfactorily on the small grid and the sandboxes. But this same code fails to work correctly on the big grid. This is due to the fact that in the current grid consisting of 400 nodes, packet loss is a major problem. And this problem escalates sharply with the increase in the no. of nodes. Specifically, when trying to image more than 150 nodes in a single attempt, the high packet loss prevents successful completion. To alleviate this problem, it has been decided to explore the use of broadcast instead of multicast.
     10
     11== Major Design Requirements ==
     12'''R.1:'''
     13{{{
     14It has been decided that all communications from the NodeHandler to the NodAgent will be through broadcast and that all feedbacks from the NodeAgent to the NodeHandler will be sent through TCP. This is because then
     15
     16- reliable feedbacks can then be ensured,
     17- explicit control over the feedback message content can be allowed,
     18- integrating the feedback messages with the existing message processing code in the
     19  NodeHandler will be easier e.g. sequence id correlation, etc,
     20- existing messages being sent from the NodeAgent to the NodeHandler can be modified to
     21  serve the dual purpose of providing feedbacks too.
     22
     23}}}
     24
     25'''R.2:'''
     26{{{
     27All communication will be handled in the communication layer which will be a separate process.
     28The present focus is on exploring reliable communication with minimum packet loss and once this issue is resolved, the issues pertaining to converting this process into a loadable library will be addressed to.
     29
     30This will need changes to the communication layer in both the NodeHandler and the NodeAgent.
     31}}}
     32
     33'''R.3:'''
     34{{{
     35The communication layer will use two separate approaches, one for sending messages and the
     36other for receiving messages. Messages being sent from the NodeHandler to the NodeAgent will
     37use broadcast. A single message will be broadcast by the NodeHandler and this message
     38will be received by all the NodeAgents.
     39
     40Messages being received from the NodeAgent will be use TCP. The NodeAgent communication layer
     41will be modified to send all messages to the NodeHandler using TCP.
     42}}}
     43
     44'''R.4:'''
     45{{{
     46The messages sent from the NodeHandler to the NodeAgent consist of commands to be executed on
     47the NodeAgent. Since the communication layer will broadcast the message to all the nodes, the NodeAgents will have the filters to deteremine whether a message is to be accepted / rejected. Current NodeAgent code has such filters and these will be enhanced only if necessary.
     48
     49After a message is sent, the communication server will wait for ACKs from the NodeAgent, which
     50will be received through the TCP socket. All message-ACK correlation for each node will be
     51done by the communication server. Also, it will, after a pre-defined interval, repeatedly send
     52the command till it receives an ACK confirming receipt of a previously sent message from all
     53the intended nodes. Only after all the NodeAgents have confirmed successful receipt of the
     54command, will the communication server initmate the NodeHandler to proceed with sending the
     55next command.
     56
     57}}}
     58
     59'''R.5:'''
     60{{{
     61The communication layer will initially be a separate server that is running the reliable
     62multicast protocol. It will also handle all TCP socket related functions. This separation of
     63processes will help in isolating and subsequent easy resolution of all communication related
     64issues. The IPC mechanism between this server and the NodeHandler will be implemented using
     65pipes. When the NodeHandler wants to send a message to the NodeAgent, this message will be
     66piped to the server which will then send the message using multicast. Again, when a message is
     67received from the NodeAgent by this server, it will pipe this message to the NodeHandler.
     68
     69Later this separate server can be combined with the NodeHandler as a loadable library if there
     70are no significant performance issues found.
     71}}}
     72
     73'''R.6:'''
     74{{{
     75The communication server will not pipe the heartbeats from the NodeAgents to the NodeHandler.
     76Instead, it will keep track of these messages on a per-node basis and on detecting a breakdown
     77 in communication; it will send a RETRY message to the NodeAgent. The NodeAgent will consider
     78it to be a message from the NodeHandler.
     79}}}
     80
     81'''R.7:'''
     82{{{
     83All issues relating to scaling impacts on the decision to use TCP will be thoroughly
     84investigated. TCP is a quick way for us to not think of reverse path reliability. Once we get
     85to proper scaling on the forward path, we will switch to UDP, if necessary. We might also
     86implement some scheme to prioritize the messages.
     87}}}
     88
     89
     90== Overall Architecture ==
     91
     92== Software Design ==
     93
     94== See Also ==
     95
     96[http://www.orbit-lab.org/wiki/Internal/DesignNotes]
     97