| 42 | Ganglia is a webservice that monitor's system load and health. The client collects system statistics on every production machine on the ORBIT network ranging from developer workstations to back-end servers. Data collected includes instantaneous CPU usage, memory usage, disk usage, temperature, network bandwidth utilization, uptime, and kernel statistics such as number of processes. This data is then sent to the ganglia server via a multicast channel where each data stream is divided into ORBIT resources. This periodic data is then collected and stored to create a plot that measures each metric versus time for upwards of a year. This can be used to accurately gauge machine wear and can help estimate time between failures. |