Thursday, January 10, 2008

Zenoss Core web site (Zope application server) crash

We had two Zenoss winexe processes go out of control yesterday. They consumed all available CPU and RAM and caused the rest of the daemons to crash/slow down. When we restarted Zenoss ("zenoss stop" followed by "zenoss start"), all Zenoss daemons came up, but zopectl (the Zope application server daemon) immediately died.

We found several of the following errors in $ZENHOME/log/event.log that appeared to be related:

2008-01-09T10:15:36 ERROR Zope.SiteErrorLog http://server.domain.local:8080/zport/RenderServer/render
Traceback (most recent call last):
File "usr/local/zenoss/lib/python/Zope2/App/startup.py", line 167, in zpublisher_exception_hook
File "usr/local/zenoss/lib/python/ZPublisher/Publish.py", line 120, in publish
File "usr/local/zenoss/lib/python/Zope2/App/startup.py", line 233, in commit
File "usr/local/zenoss/lib/python/transaction/_manager.py", line 84, in commit
File "usr/local/zenoss/lib/python/transaction/_transaction.py", line 381, in commit
File "usr/local/zenoss/lib/python/transaction/_transaction.py", line 379, in commit
File "usr/local/zenoss/lib/python/transaction/_transaction.py", line 424, in _commitResources
File "usr/local/zenoss/lib/python/ZODB/Connection.py", line 462, in commit
File "usr/local/zenoss/lib/python/ZODB/Connection.py", line 495, in _commit
ConflictError: database conflict error (oid 0x3b, class Products.ZenUtils.PObjectCache.PObjectCache)

Remediation:
  1. Make sure zeoctl is started (as zenoss, "zeoctl start" followed by a few seconds pause and then "zenoss status" to confirm it has a PID and is running).
  2. cd $ZENHOME/var
  3. rm *.zec (this deletes invalid cache files that are causing the above error)
  4. zopectl start
  5. Wait a few seconds, then check if Zope stays running (use "zenoss status" or just hit the website to confirm).

No comments:

Post a Comment