Skip to content

Can't log from within tpool.execute #432

@smerritt

Description

@smerritt

If you try to log from within a function called by tpool.execute, there is a chance that the tpool thread never returns, and you see a stack trace like this one:

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 458, in fire_timers
    timer()
  File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 58, in __call__
    cb(*args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/semaphore.py", line 147, in _do_acquire
    waiter.switch()
error: cannot switch to a different thread

This is because many logging handlers have mutexes that are threading._RLock objects, and Eventlet's replacement for thread.allocate_lock returns an eventlet.semaphore.Semaphore object, which does not work across different hubs in different pthreads.

Here's a small script to reproduce the issue:

#!/usr/bin/env python
#
# Demonstrates the crash with logging across pthreads
import eventlet.patcher
import eventlet.tpool
import logging
import random
import sys
import time

eventlet.patcher.monkey_patch()

logger = logging.getLogger("logger-test")
# This handler's .lock is a threading._RLock
logger.addHandler(logging.StreamHandler(sys.stdout))
logger.setLevel(logging.DEBUG)


def log_n_times(me, n):
    for x in range(n):
        logger.info("%s %d", me, x)
        time.sleep(random.random() * 0.01)


logger.info("starting")

greenthread = eventlet.spawn(log_n_times, 'greenthread', 50)
eventlet.tpool.execute(log_n_times, 'pthread', 50)
greenthread.wait()

logger.info("done")

The bug was originally found in Openstack Swift, and there's a better writeup at https://bugs.launchpad.net/swift/+bug/1710328 and a commit at openstack/swift@6d16079 that fixes the problem, but only for Swift, and not in a general way.

I'd like to figure out how to fix this in general, but I'm not sure how to proceed. Using a pipe-based mutex for all _RLock objects would work, but would be a very expensive fix. Perhaps just the locks in logging handlers, because those are global?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions