| | 301 | === 6/24/2010 ==== |
| | 302 | I forgot to log a bunch of changes: |
| | 303 | * found a bug because I reduced the wait time for the experiment to complete. Since connect retry was rand(60)+(60) my log copies would copy over |
| | 304 | incomplete files, since it waited only 30 seconds to copy. I've redone the numbers, I wait only rand(20) to attempt to reconnect but now try 3 times. |
| | 305 | I wait 90 seconds before trying to copy the log file, this should give me enough time to capture the retries. |
| | 306 | * There was a cascade of query failures due to the fact that I was searching for the testbed id with the short domain name, instead of the FQDN in the |
| | 307 | test bed table. This value was given to me by the gatherer, I've since modified the gatherer to use the FQDN. All the other queries depended on this |
| | 308 | number so the error propagated down. |
| | 309 | * This error however demonstrated a specific flaw in how I handled empty query results. I indicate Failed queries by returning nil, instead of an array. |
| | 310 | The only reason I caught the error was because I tried to flatten the nil return. I've updated this to raise an exception if a nil query occurs for |
| | 311 | any of the members of the Identify class. Not being able to identify the node should be a fatal error. This exception is unhandled so it will |
| | 312 | propagate up to main block where it gets logged and terminates the script. |
| | 313 | * Also added a little bit of logging to gatherer but not much. I should really fix it's error handeling |
| | 314 | * Noticed that sometimes I get a xml create object error. I'll have to figure out why that's happening. It's probably due to gatherer not completing |
| | 315 | properly. But Now I should be able to find the nodes where it happens. |
| | 316 | * Trying to stick to the logging/exception rasing convention of setting the error text to: "class.method - error" |