Skip to content

Deadlock in Live with redirected stdout/stderr during concurrent refresh and reentrant logging #4158

@alexgshaw

Description

@alexgshaw

Summary

We hit a permanent process deadlock in a long-running asyncio CLI using rich.Live with the default redirect_stdout=True / redirect_stderr=True and auto_refresh=True.

The app is Harbor, a high-concurrency benchmark runner. The immediate trigger was noisy third-party output and an aiohttp.ClientSession.__del__ warning during cancellation cleanup, but the observed deadlock appears to be a Rich lock-order inversion between Live._lock and Console._lock.

I understand redirecting global stdout/stderr during live rendering is inherently tricky. If this usage pattern is unsupported, a docs warning would help. If it is expected to be safe, I think there is a real deadlock here.

Environment

  • Python: 3.13.3
  • Rich: 14.2.0
  • aiohttp: 3.13.4
  • LiteLLM: 1.86.2
  • Terminal: tmux pane
  • Platform: Linux

What happened

The CLI froze permanently: the Live spinner stopped, no progress counters advanced, and no job files were written for 9+ hours. The process was still alive but all useful work had stopped.

At the time of the freeze:

  • 46 TLS sockets owned by the process were in CLOSE-WAIT.
  • Many asyncio tasks had recently been cancelled by asyncio.wait_for(...) timeouts.
  • aiohttp.ClientSession.__del__ was calling the event loop exception handler for an unclosed session.
  • Third-party code was also printing directly to stdout.

Observed stacks

Abbreviated py-spy dump output from the wedged process:

Thread 2525706 (idle): "MainThread"
    process_renderables (rich/live.py:281)
    print (rich/console.py:1708)
    write (rich/file_proxy.py:47)
    emit (logging/__init__.py:1153)
    handle (logging/__init__.py:1026)
    callHandlers (logging/__init__.py:1744)
    handle (logging/__init__.py:1680)
    _log (logging/__init__.py:1664)
    error (logging/__init__.py:1548)
    default_exception_handler (asyncio/base_events.py:1865)
    call_exception_handler (asyncio/base_events.py:1891)
    __del__ (aiohttp/client.py:465)
    ...
    print (rich/console.py:1724)
    write (rich/file_proxy.py:47)
    get_llm_provider (.../litellm/.../get_llm_provider_logic.py:503)
    completion_cost (.../litellm/cost_calculator.py:1255)
Thread 2525755 (idle): "Thread-1"
    render_lines (rich/console.py:1375)
    __rich_console__ (rich/live_render.py:81)
    render (rich/console.py:1345)
    print (rich/console.py:1724)
    refresh (rich/live.py:267)
    run (rich/live.py:38)

Another worker thread was also in redirected stdout rendering:

Thread 2526618 (idle): "asyncio_27"
    render_lines (rich/console.py:1375)
    __rich_console__ (rich/live_render.py:81)
    render (rich/console.py:1345)
    print (rich/console.py:1724)
    write (rich/file_proxy.py:47)
    ... third-party library print path ...

Suspected lock inversion

From Rich 14.2.0 source:

  • Live._RefreshThread.run() enters with self.live._lock: and then calls self.live.refresh().
  • Live.refresh() calls self.console.print(...) while still inside the live lock path.
  • Console.print() eventually needs Console._lock in rendering/writing paths.
  • FileProxy.write() calls console.print(...) for redirected stdout/stderr.
  • Console.print() invokes live render hooks; Live.process_renderables() takes Live._lock.

The observed deadlock looks like this:

main thread:
  owns Console._lock
  -> FileProxy/logging path re-enters Rich
  -> waits for Live._lock in Live.process_renderables()

refresh thread:
  owns Live._lock
  -> calls console.print() from Live.refresh()
  -> waits for Console._lock

So the two threads wait forever:

main thread:    Console._lock -> waits Live._lock
refresh thread: Live._lock    -> waits Console._lock

Linux syscall state supported this: the hot Rich threads were parked in futex waits, not blocked on network IO.

Why stderr redirection matters

aiohttp.ClientSession.__del__ calls loop.call_exception_handler(...), and asyncio's default handler logs with Python logging. If no explicit handler captures it, Python's fallback stderr handler writes to the current sys.stderr. During Live, Rich has replaced sys.stderr with FileProxy, so finalizer/error logging can re-enter Rich at arbitrary points during a render.

Third-party print(...) calls can do the same through redirected stdout.

Workaround

For our app, disabling stdout/stderr redirection appears to be the right defensive workaround:

with Live(
    Group(loading_progress, running_progress),
    refresh_per_second=10,
    redirect_stdout=False,
    redirect_stderr=False,
):
    ...

We are also suppressing noisy third-party stdout output separately.

Question

Is Live(..., redirect_stdout=True, redirect_stderr=True, auto_refresh=True) intended to be safe when arbitrary third-party threads/tasks may print or log to stdout/stderr during rendering?

If yes, I think the lock ordering in Live / Console / FileProxy can deadlock. If no, would you accept a docs warning around redirected stdout/stderr in multi-threaded or high-concurrency asyncio applications?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions