Tuesday, April 15, 2008

Make zenwin and zenwinmodeler ignore WMI errors

(This tip is also on the Zenoss wiki.)

At least in version 2.1.1, zenwin, zenwinmodeler, and zeneventlog have (IMO) a critical defect: if there are any /Status/WMI/Conn issues not in history for the device, they ignore the device. On our network, for some reason we end up with a lot of these events ('timegenerated' errors, various intermittent failures to connect, etc.). This causes the monitoring of our Windows servers to dramatically fall off as the system runs, and we miss critical issues.

I changed the behavior of these three systems to go ahead and attempt monitoring even if WMI issues are encountered. I learned that most of the time these WMI issues are spurious and successful monitoring CAN still be attempted. If you use this code, I recommend combining it with event commands to restart the zenoss daemons when it finds them dead.

Also, in zenwin, I added/improved the exception handling; a failure to create the watcher object occurs outside of a try block. Much of this code is an attempt to keep zenwin from crashing if it tries to monitor a Windows Server 2008 machine (Zenoss is not compatible with WS 2008 or Vista's WMI interface, and zenwin cannot monitor services on these devices). I ended up adding a hardcoded exclusion list so I can otherwise monitor the machine but have zenwin skip it. For some reason, zeneventlog seems to not crash, although it is not able to retrieve events from the WS 2008 machine either.

Please see the Zenoss wiki for the zenwin and zenwinmodeler diffs.

No comments:

Post a Comment