ORBIT-USER: Node not registered for testbed?
Max Ott
max at semandex.net
Sun Aug 13 21:56:48 EDT 2006
Chris,
I'm not sure how testing has been done with the currently installed
versions of nodehandler and services.
In the little time I have I've started to re-write many of the
performance limited parts and I have repeatedly worked with all 400
nodes without much problems.
As for OML, there are reported cases of problems and large amounts of
dropped measurements, but I have not been able to reproduce them.
There seem to be some issues with the reported sequence numbers, but
again, I haven't gotten to the bottom of that. It normally works for
me.
We have made some improvement to OML in the last few weeks but haven't
tested them. If you have an experiment which stress tests OML, please
let me know.
Thanks,
-max
On 8/14/06, chris at orderonenetworks.com <chris at orderonenetworks.com> wrote:
> Max,
>
> Thanks very much for the detailed reply. It was unexpected and welcome
> coming on a Sunday evening!
>
> I'm hoping to run a routing protocol scalability test making use of the
> entire grid (if possible). So the more nodes that are able to be part of
> it, the better.
>
> Will I need to do anything special to ensure that my OML logs (when I get
> them working) can be collected from that many nodes?
>
> Is there a FAQ anywhere talking about issues when trying to use the entire
> grid at once?
>
> Thank you again,
> Chris
>
>
>
>
> > Chris,
> >
> > You ran into a problem with the grid services.
> >
> > If you look at the design of the Orbit framework, you will see that
> > the interface between the user and the system is what we call 'grid
> > services'. They are a set of web services which all user activities
> > are building on.
> >
> > Specifically, there is a service called 'CMC' which allows a user to
> > control the node hardware: switch on/off power, get environment
> > measurements, such as voltage levels and temperature, and get access
> > to the serial console.
> >
> > The nodehandler is using this service to switch the nodes on or off.
> > The CMC service exposes a few methods for this - actually far too many
> > at this stage. The CMC service also maintains a database for what
> > nodes are in service and which aren't and that's what you are hitting.
> > It appears that node 4 at 1 is 'out-of-service'.
> >
> > console:~$ wget -O - -q http://cmc:5012/cmc/allStatus | grep "'n_4_1'"
> > <node name = 'n_4_1' x='4' y='1' state='NODE NOT
> > AVAILABLE' />
> >
> > The problem is that the CMC service is unfortunately not implemented
> > consistently. When the nodehandler requests that an entire set of
> > nodes to be switched on, the CMC service quietly ignores all the
> > 'out-of-service' nodes and reports success. As the nodehandler doesn't
> > know that some nodes are out-of-service it will later request those
> > nodes to be reset - assuming they went astray. Now the CMC service is
> > reporting an error which leads to what you have observed.
> >
> > Now, what should you, or we do. Unfortunately, the CMC service is a
> > really important one to automate the operation of Orbit. We spent a
> > lot of engineering effort in getting it nicely integrated into the
> > hardware - and that part works really well. However, the software on
> > the server side never reached the same level of maturity. Lots of
> > history, water under the bridge.
> >
> > There are ways for the nodehandler to work around those. Let me see if
> > I can come up with something.
> >
> > In the meantime, I can only ask you to use the 'allStatus' command I
> > showed above to see if the nodes you need for your experiment are
> > really ready for use.
> >
> > Sorry for the inconvenience,
> >
> > -max
> >
> >
> >
> > On 8/14/06, chris at orderonenetworks.com <chris at orderonenetworks.com> wrote:
> >> Hello,
> >>
> >> I'm getting an odd message when trying to imageNodes on the main grid.
> >>
> >> FATAL run: ServiceException: ServiceException
> >> Node (4,1) Not Registered for Testbed:
> >> '#<CMC::Testbed:0xa7ab5fc0>'
> >>
> >> I've pasted the entire run below.
> >>
> >> If I try subsections of the grid, I get other nodes that give the same
> >> error.
> >>
> >> I haven't seen this error on the sandboxes. How do I register a node for
> >> the testbed?
> >>
> >> Thanks,
> >> Chris
> >>
> >> ----------------------------
> >> Imaging nodes: 1..20,1..20 with image baseline.ndz
> >> Using config /etc/nodehandler/grid.cfg
> >> /etc/nodehandler/grid.cfg:20: warning: Insecure world writable dir /tmp,
> >> mode 040777
> >> Using logfile /etc/nodehandler/nodehandler_log.xml
> >> INFO init: NodeHandler Version 3.6.4-1 (849)
> >> INFO init: Experiment ID: grid_2006_08_13_16_56_19
> >> INFO Experiment: load system:exp:stdlib
> >> INFO prop.resetDelay: resetDelay = 180:Fixnum
> >> INFO Experiment: load system:exp:imageNode
> >> INFO prop.nodes: nodes = [1..20, 1..20]:Array
> >> INFO prop.image: image = "baseline.ndz":String
> >> INFO stdlib: 400 out of 400 node(s) still down n_16_19,n_6_1,n_20_16
> >> INFO stdlib: 400 out of 400 node(s) still down n_16_19,n_6_1,n_20_16
> >> INFO stdlib: 400 out of 400 node(s) still down n_16_19,n_6_1,n_20_16
> >> /tmp/eee.169/lib/util/communication.rb:127: warning: Insecure world
> >> writable dir /tmp, mode 040777
> >> INFO stdlib: 400 out of 400 node(s) still down n_16_19,n_6_1,n_20_16
> >> INFO n_18_19: Checked in as /ip/10.10.18.19 booting off baseline:1.0.9
> >> WARN n_18_19: Expected image 'pxe:1.1.4', but node reported
> >> 'baseline:1.0.9'.
> >> INFO n_18_19: Resseting node
> >> FATAL run: ServiceException: ServiceException
> >> Node (4,1) Not Registered for Testbed:
> >> '#<CMC::Testbed:0xa7ab5fc0>'
> >> INFO run: Experiment grid_2006_08_13_16_56_19 finished after 0:42
> >> done.
> >>
> >>
> >>
> >
> >
> > --
> > Dr. Max Ott
> > Research Program Leader - Network and Pervasive Computing, NICTA Australia
> > Founder & CTO, Semandex Networks
> > Research Professor, WINLAB, Rutgers University
> >
>
>
>
--
Dr. Max Ott
Research Program Leader - Network and Pervasive Computing, NICTA Australia
Founder & CTO, Semandex Networks
Research Professor, WINLAB, Rutgers University
More information about the orbit-user
mailing list