Friday, May 15, 2009

Zenoss zencommand daemon overload

I think we have found that the zencommand daemon in Zenoss Core has some very reachable limits with regard to the number of commands it will process. We have been increasing our command monitoring lately and got to a point where zencommand, the daemon responsible for running these types of monitoring functions, had 1,200 commands in each cycle. I noticed that some of the performance templates that graphed the counters fetched by our commands had gaps--sometimes large gaps--and some had simply quit entirely. The odd thing was that we didn't really get any warnings (our VP went looking for graphed data and, ummm, didn't find it).

If I took one of the devices with templates that weren't graphing and manually ran zencommand against that device, it worked perfectly and fetched all the counters from the various commands. Stracing it didn't show any errors either. But with the amount of load we were providing, it was definitely silently dropping commands.

Zencommand seems to be a rather single-threaded beast. Its ability to get everything done is a function of the following:
  • The number of data sources it is processing
  • The number of monitored devices
  • The cycle time of the data sources it is processing
I was able to collapse some of our data sources so that instead of 1,200 commands I got down to around 560, and voila--the graphs that had not been painting suddenly began working correctly. To avoid this issue, I recommend the following:
  • The native Device template uses SNMP (and SNMP Informant on the Windows side) to read the base CPU, memory, and paging counters. To avoid deploying SNMP Informant everywhere, some time ago we had changed to using a different template that used zencommand and remote WMI calls to read these in. I am going to change this back to using Device, which will take quite a bit of load off of zencommand.
  • Watch the cycle time. Does anyone have QoS recommendations for the resolution of performance counters? 60 second cycle times are a bit aggressive, but what works well--3 minutes? 5 minutes?
  • Always validate that the graphs are painting after making changes affecting zencommand. If you roll out a new template, don't just look at that template to make sure it's working--look at other things zencommand is handling after you roll it out. As we discovered, adding too much load can silently break other things.

Monday, May 11, 2009

Windows 7 RC x64 impressions

I rebuilt my machine due to system instability and decided to go with Windows 7 x64 this time now that we have a release candidate. So far, this is running much better than Vista ever did and some of the new features are compelling. I think my favorite feature so far is how the window manager groups related windows behind the application icon only, and--wait for it--lets you CLOSE them from the window preview when you hover over the icon. You no longer need to raise the window first and use its close controls or right-click to close-the 'x' icon is right there on the preview. This seems simple, but it's a great improvement.

Another nice feature is the selection of a wireless network. This works the same as Ubuntu does now and just shows you the available networks with a single click. Each has radio buttons to select it. No more wading through multiple screens just to change networks.

I have hit one bug: I have files on my desktop (as shown in \Users\user\Desktop), but the desktop itself is completely clean. I can right-click on the desktop and create new files, but while these appear in an Explorer folder pointed to that location, they vanish from the desktop itself. Has anyone else run into this?
** Update: I'm an idiot; it's a setting to show icons on the desktop. I am not sure why this was off by default. The setting is under Control Panel\Appearance and Personalization\Personalization\Change desktop icons, and you check User's Files.

Overall, I highly recommend giving the RC a spin.

Tuesday, May 05, 2009

Ubuntu 9.04 and .local domain access

I had to re-discover why my organization's .local internal domain wouldn't resolve on Ubuntu 9.04. I was unable to ping or RDP to any of the machines using their fully-qualified names (e.g. server.domain.local), although this DID work if I used the NetBIOS names (e.g. server). After researching for a bit, I rediscovered something someone helped me with a long time ago when I thought this issue was related to DNS resolution over a PPTP VPN connection (it is not in fact related to this).

The issue: the avahi daemon. As I understand it, it grabs the .local domain as its own and interferes with ping, RDP, and other communication when using the fully-qualified .local domain names. There are two fixes I know of:

  1. Disable avahi. I understand this can interfere with apps that use avahi, so...
  2. Edit the hosts line if /etc/nsswitch.conf to read:
hosts: files mdns4_minimal [NOTFOUND=return] dns mdns4

hosts: files dns mdns4_minimal mdns4