Skip to content

One instance crashed in Rethinkdb 2.2.5 cluster, possible broken data #6033

@ntquyen

Description

@ntquyen

I'm running rethinkdb 2.2.5 cluster of 3 instances:

One instance in the cluster suddenly crashed and cannot be restarted, backtrace:

Running rethinkdb 2.2.5~0jessie (GCC 4.9.2)...
Running on Linux 4.3.6-coreos x86_64
Loading data from directory /data
Listening for intracluster connections on port 29015
Connected to server "4200b06bc82a_1yx" 83a15f4a-a772-4649-b866-d537e5c0cfba
Connected to proxy ee2e9ad4-3bd7-48a0-ba3f-2c9406ac3d02
Connected to server "02f81613b7c1_wdx" 8138c225-e59c-4e5c-8f2e-93a5689aea47
Listening for client driver connections on port 28015
Listening for administrative HTTP connections on port 8080
Listening on addresses: 127.0.0.1, 10.2.44.2, ::1, fe80::42:aff:fe02:2c02%5
Server ready, "c1d17389be71_29y" d5a8f7d7-33b2-4f60-9561-38d4a9d163e1
A newer version of the RethinkDB server is available: 2.3.4. You can read the changelog at <https://github.com/rethinkdb/rethinkdb/releases>.
Version: rethinkdb 2.2.5~0jessie (GCC 4.9.2)
error: Error in src/rdb_protocol/serialize_datum.cc at line 814:
error: Guarantee failed: [datum_deserialize(&read_stream, &type) == archive_result_t::SUCCESS] Deserialization of datum type from buf failed with error archive_result_t::RANGE_ERROR.
error: Backtrace:
error: Fri Aug  5 03:52:04 2016

       1 [0xa8b7f0]: backtrace_t::backtrace_t() at 0xa8b7f0 (rethinkdb)
       2 [0xa8bb69]: format_backtrace(bool) at 0xa8bb69 (rethinkdb)
       3 [0xd0135d]: report_fatal_error(char const*, int, char const*, ...) at 0xd0135d (rethinkdb)
       4 [0x92f080]: ql::datum_deserialize_from_buf(shared_buf_ref_t<char> const&, unsigned long) at 0x92f080 (rethinkdb)
       5 [0x92f656]: ql::datum_deserialize_pair_from_buf(shared_buf_ref_t<char> const&, unsigned long) at 0x92f656 (rethinkdb)
       6 [0x8d3329]: ql::datum_t::unchecked_get_pair(unsigned long) const at 0x8d3329 (rethinkdb)
       7 [0x8d60fb]: rethinkdb() [0x8d60fb] at 0x8d60fb ()
       8 [0x8d61d0]: ql::datum_t::is_ptype() const at 0x8d61d0 (rethinkdb)
       9 [0x87668b]: ql::obj_or_seq_op_impl_t::eval_impl_dereferenced(ql::term_t const*, ql::scope_env_t*, ql::args_t*, scoped_ptr_t<ql::val_t> const&, std::function<scoped_ptr_t<ql::val_t> ()>) const at 0x87668b (rethinkdb)
       10 [0x87952f]: ql::bracket_term_t::eval_impl(ql::scope_env_t*, ql::args_t*, ql::eval_flags_t) const at 0x87952f (rethinkdb)
       11 [0x7d0952]: ql::op_term_t::term_eval(ql::scope_env_t*, ql::eval_flags_t) const at 0x7d0952 (rethinkdb)
       12 [0x8ea64a]: ql::runtime_term_t::eval_on_current_stack(ql::scope_env_t*, ql::eval_flags_t) const at 0x8ea64a (rethinkdb)
       13 [0x8ea94c]: ql::runtime_term_t::eval(ql::scope_env_t*, ql::eval_flags_t) const at 0x8ea94c (rethinkdb)
       14 [0x7cd19a]: ql::op_term_t::maybe_grouped_data(ql::scope_env_t*, ql::argvec_t*, ql::eval_flags_t, counted_t<ql::grouped_data_t>*, scoped_ptr_t<ql::val_t>*) const at 0x7cd19a (rethinkdb)
       15 [0x7d030b]: ql::op_term_t::term_eval(ql::scope_env_t*, ql::eval_flags_t) const at 0x7d030b (rethinkdb)
       16 [0x8ea64a]: ql::runtime_term_t::eval_on_current_stack(ql::scope_env_t*, ql::eval_flags_t) const at 0x8ea64a (rethinkdb)
       17 [0x8ea94c]: ql::runtime_term_t::eval(ql::scope_env_t*, ql::eval_flags_t) const at 0x8ea94c (rethinkdb)
       18 [0x80ddcb]: ql::reql_func_t::call(ql::env_t*, std::vector<ql::datum_t, std::allocator<ql::datum_t> > const&, ql::eval_flags_t) const at 0x80ddcb (rethinkdb)
       19 [0x80d436]: ql::func_t::call(ql::env_t*, ql::datum_t, ql::eval_flags_t) const at 0x80d436 (rethinkdb)
       20 [0x7b8383]: compute_keys(store_key_t const&, ql::datum_t, sindex_disk_info_t const&, std::vector<std::pair<store_key_t, ql::datum_t>, std::allocator<std::pair<store_key_t, ql::datum_t> > >*) at 0x7b8383 (rethinkdb)
       21 [0x7b9fb3]: rdb_update_single_sindex(store_t*, store_t::sindex_access_t const*, deletion_context_t const*, rdb_modification_report_t const*, unsigned long*, auto_drainer_t::lock_t, cond_t*, std::vector<std::pair<ql::datum_t, boost::optional<unsigned long> >, std::allocator<std::pair<ql::datum_t, boost::optional<unsigned long> > > >*, std::vector<std::pair<ql::datum_t, boost::optional<unsigned long> >, std::allocator<std::pair<ql::datum_t, boost::optional<unsigned long> > > >*) at 0x7b9fb3 (rethinkdb)
       22 [0x7c2559]: callable_action_instance_t<std::_Bind<void (*(store_t*, store_t::sindex_access_t*, deletion_context_t const*, rdb_modification_report_t const*, unsigned long*, auto_drainer_t::lock_t, cond_t*, std::vector<std::pair<ql::datum_t, boost::optional<unsigned long> >, std::allocator<std::pair<ql::datum_t, boost::optional<unsigned long> > > >*, std::vector<std::pair<ql::datum_t, boost::optional<unsigned long> >, std::allocator<std::pair<ql::datum_t, boost::optional<unsigned long> > > >*))(store_t*, store_t::sindex_access_t const*, deletion_context_t const*, rdb_modification_report_t const*, unsigned long*, auto_drainer_t::lock_t, cond_t*, std::vector<std::pair<ql::datum_t, boost::optional<unsigned long> >, std::allocator<std::pair<ql::datum_t, boost::optional<unsigned long> > > >*, std::vector<std::pair<ql::datum_t, boost::optional<unsigned long> >, std::allocator<std::pair<ql::datum_t, boost::optional<unsigned long> > > >*)> >::run_action() at 0x7c2559 (rethinkdb)
       23 [0x99c958]: coro_t::run() at 0x99c958 (rethinkdb)
error: Exiting.

When the crash occurred, systemd-coredum took all the CPU and cannot be stopped unless I stop the rethinkdb container and restart the server. After the server restarted and systemd-coredum disabled, rethinkdb still cannot be started, same crash.

It is likely the data is somehow broken (Deserialization failed). I'm have 10 rethinkdb clusters running in production but this is the first time I got this issue.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions