Skip to content

RethinkDB crashes on machine with 128 CPUs #6895

@hagaram

Description

@hagaram

Describe the bug
When I run RethinkDB in Kubernetes, or vanilla docker on a machine with 128 CPUs, I get this error:

 Version: rethinkdb 2.4.0~0bionic (CLANG 6.0.0 (tags/RELEASE_600/final))                                                                                                        │
│ error: Error in thread 128 in ./src/containers/intrusive_list.hpp at line 175:                                                                                                 │
│ error: Guarantee failed: [before != nullptr]                                                                                                                                   │
│ error: Backtrace:                                                                                                                                                              │
│ error: Tue Jul 14 13:25:50 2020                                                                                                                                                │
│                                                                                                                                                                                │
│        1 [0xcd05a0]: backtrace_t::backtrace_t() at 0xcd05a0 (rethinkdb)                                                                                                        │
│        2 [0xcd1216]: lazy_backtrace_formatter_t::lazy_backtrace_formatter_t() at 0xcd1216 (rethinkdb)                                                                          │
│        3 [0xccffa8]: format_backtrace[abi:cxx11](bool) at 0xccffa8 (rethinkdb)                                                                                                 │
│        4 [0xc8898d]: report_fatal_error(char const*, int, char const*, ...) at 0xc8898d (rethinkdb)                                                                            │
│        5 [0xd1da97]: intrusive_list_t<linux_thread_message_t>::insert_between(linux_thread_message_t*, intrusive_list_node_t<linux_thread_message_t>*, intrusive_list_node_t<l │
│        6 [0xd1d023]: linux_message_hub_t::store_message_sometime(threadnum_t, linux_thread_message_t*) at 0xd1d023 (rethinkdb)                                                 │
│        7 [0xd1d304]: linux_message_hub_t::on_event(int) at 0xd1d304 (rethinkdb)                                                                                                │
│        8 [0xd21655]: epoll_event_queue_t::run() at 0xd21655 (rethinkdb)                                                                                                        │
│        9 [0xd22b6c]: linux_thread_pool_t::start_thread(void*) at 0xd22b6c (rethinkdb)                                                                                          │
│        10 [0x7f299d0816db]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x76db) [0x7f299d0816db] at 0x7f299d0816db (/lib/x86_64-linux-gnu/libpthread.so.0)                          │
│        11 [0x7f299cdaa88f]: clone+0x3f at 0x7f299cdaa88f (/lib/x86_64-linux-gnu/libc.so.6)                                                                                     │
│ error: Exiting

To Reproduce
Steps to reproduce the behavior:

  1. Run rethinkDB on machine with 128 CPUs

Expected behavior
Running RethinkDB container.

System info

  • OS: Debian Buster
  • RethinkDB Version: 2.4.0

Additional context
It clearly is known "issue", but the workarounds mentioned here aren't applicable in my case.
https://success.docker.com/article/ucp-will-not-install-on-systems-with-more-than-127-logical-cpu-cores

It clearly states here, that it is max number you support at the moment
https://github.com/rethinkdb/rethinkdb/blob/next/src/config/args.hpp
#define MAX_THREADS 128

Do you know about any workaround, which would for example "fake" the cpu count for container/POD ? Or better, are you planning to add support for higher number of CPUs ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions