ORBIT-USER: Image Problems
Max Ott
max at semandex.net
Fri Jan 19 02:11:18 EST 2007
Chris,
The list of the successful nodes can be extracted from the log file
with a grep and a cut looking for DONE.OK. If you are adventurous, you
can use some XST code on the xml file generated :)
The number of nodes available should be much higher than what you see
(and we all see) and is right now a combination of many factors, most
of them being currently addressed, some of them are out of our reach
due to budget constraints.
Finally, the reason it won't finish has only shown up recently and is
the result of frisbee not finishing. It would help me a lot if you
could telnet into the node shown in parenthesis (n_14_18 in your
recent case), download the /var/log/nodeAgent.log file and send it to
me.
I haven't figured out if frisbee goes deaf, the network link goes
down, or there is a disk write error which goes unreported.
Please remember that imaging is most likely the most stressing
application for the support infrastructure while installing a full
operating system on 300+ nodes in about 15 minutes is a very
impressive number. Now I agree, that you have been kept waiting for
another 45 nodes for the last three nodes to not finish is
frustrating.
Hopefully the numbers and reliability will increase. There is a lot of
work going on behind the scene to improve stability and reliability
and we all hope it will bear fruits soon.
Thanks,
-max
On 1/19/07, chris at orderonenetworks.com <chris at orderonenetworks.com> wrote:
> Ivan,
>
> I ran the image test today and was only able to image 308 nodes. Do you
> anticpate these other nodes being able to be fixed? or is this about the
> size that it will be?
>
> I'd like be to able to use as many nodes as possible, but I have no idea
> which nodes were able to image and which were not. Is there a way to get a
> list of all nodes that imaged successfully? If I had this list, I could
> then feed it into the rest of my experiment.
>
> As well, the imageNodes4 command still doesn't end if all the nodes don't
> image. I think I let it run for an hour or so, and it basically kept
> repeating this line:
>
> INFO exp: Progress(308/1/311): 0/98/100 min(n_14_18)/avg/max (139.040551)
>
> The experiment ID was 'grid_2007_01_18_17_37_25'.
>
> Thanks,
> Chris
>
>
>
--
Dr. Max Ott
Research Program Leader - Network and Pervasive Computing, NICTA Australia
Founder & CTO, Semandex Networks
Research Professor, WINLAB, Rutgers University
Senior Visiting Fellow, School of EE&T, UNSW
More information about the orbit-user
mailing list