Skip to content

[BUG] expand_keys corrupts binary msgpack data in EVALSHA keys ("Missing bytes in input") #5904

@Olen

Description

@Olen

Prerequisites

Describe the bug

Bayes classification and learning intermittently fail with ERR user_script:1: Missing bytes in input when Redis Lua scripts attempt to cmsgpack.unpack() the token data passed by rspamd. The error is not caused by corrupted data in Redis — it reproduces on a completely empty Redis instance with zero Bayes keys.

The error affects both bayes_classify.lua (cmsgpack.unpack(KEYS[3])) and bayes_learn.lua (cmsgpack.unpack(KEYS[5])), and occurs on both proxy (milter) and controller workers. The failure rate is approximately 30–40% of all Bayes operations.

Steps to Reproduce

  1. Configure per-user Bayes with new_schema = true and Redis backend
  2. Train sufficient messages to exceed min_learns threshold (I used min_learns = 10)
  3. Send multiple messages to the trained user via normal mail delivery (proxy/milter path)
  4. Observe rspamd log — some messages classify successfully (BAYES_HAM/BAYES_SPAM symbol appears), others fail with "Missing bytes in input"

Alternatively, batch learning via the controller HTTP API shows the same error:

for msg in /var/mail/example.com/user/cur/*; do
    curl -s -H "Deliver-To: [email protected]" --data-binary "@$msg" http://127.0.0.1:11334/learnham
done
# ~30-40% of requests fail with "Missing bytes in input"

Key observations:

  • The error occurs on completely empty Redis — I flushed ALL Bayes-related keys (RS*, learned_ids, BAYES_HAM_keys, BAYES_SPAM_keys) and the error continued immediately on the next incoming message.
  • Message size is not a factor. Three messages to the same recipient within ~12 minutes, same worker PID:
    • 63 KB — FAILS (16:05:33)
    • 34 KB — succeeds (16:07:00)
    • 95 KB — succeeds (16:16:57)
  • The bug is present in both 3.12.1 and 3.14.3 Lua scripts (different EVALSHA hashes, same error).

Expected behavior

cmsgpack.unpack() should always succeed on the msgpack-encoded token data that rspamd passes to Redis. Bayes classification and learning should not fail intermittently.

Error statistics (over ~48 hours)

Category Count
Total "Missing bytes" errors 4,307
Classify errors (rspamd_redis_classified) 1,455
Learn errors (rspamd_redis_learned) 1,426
Successful BAYES classifications 38
Controller worker errors 4,221
Proxy (milter) worker errors 86

The high controller error count is from batch learning via the HTTP API (/learnspam, /learnham).

Script hashes involved

Script hash Script rspamd version
ff34a0661245b91d202d39c9f94958dbab8e5284 bayes_classify.lua 3.14.3
9ca9e8a2b242f9ce86c78e654f6957d9908605b1 bayes_learn.lua 3.14.3
0075688c9013897c35b1ef045c2b9f55d12d4586 bayes_classify.lua 3.12.1
29ea7b39082121d7f0fa26b7562d80e28c8a656b bayes_learn.lua 3.12.1

Annotated log examples

Three messages to the same recipient, same worker PID (#185686), within 12 minutes:

Message A — 63 KB, FAILS:

16:05:33 #185686(rspamd_proxy) <bf9fa1>; proxy; rspamd_redis_classified: cannot classify task: ERR user_script:1: Missing bytes in input. script: ff34a0661245b91d202d39c9f94958dbab8e5284, on @user_script:1.
16:05:33 #185686(rspamd_proxy) <bf9fa1>; proxy; rspamd_task_write_log: id: <[email protected]>, len: 63023, time: 1771.137ms, dns req: 46, rcpts: <[email protected]>

Message B — 34 KB, SUCCEEDS:

16:07:00 #185686(rspamd_proxy) <32801a>; proxy; rspamd_task_write_log: id: <[email protected]>, len: 34229, time: 687.404ms, dns req: 66, rcpts: <[email protected]>

Message C — 95 KB, SUCCEEDS:

16:16:57 #185686(rspamd_proxy) <762a46>; proxy; rspamd_task_write_log: id: <[email protected]>, len: 94567, time: 598.173ms, dns req: 66, rcpts: <[email protected]>

Versions

Rspamd daemon version 3.14.3

CPU architecture x86_64; features: avx2, avx, sse2, sse3, ssse3, sse4.1, sse4.2, rdrand
Hyperscan enabled: TRUE
Jemalloc enabled: TRUE
LuaJIT enabled: TRUE (LuaJIT version: LuaJIT 2.1.1764593432)
ASAN enabled: FALSE
BLAS enabled: FALSE
Fasttext enabled: FALSE
  • OS: Debian 12.11 (bookworm), x86_64
  • Redis: 7.0.15
  • Deployment: docker-mailserver v15

Also reproduced with rspamd 3.12.1 (the version shipped with docker-mailserver v15 before upgrading via apt).

Additional Information

Classifier configuration:

backend = "redis";
new_schema = true;
min_learns = 10;

per_user = <<EOD
-- Lua function that resolves aliases to mailbox users
-- (simplified — resolves via postfix virtual alias map)
return function(task)
  local rcpt = task:get_principal_recipient()
  if rcpt then return resolve(rcpt) end
  local recipients = task:get_recipients('any')
  if recipients and recipients[1] and recipients[1]['addr'] then
    return resolve(recipients[1]['addr'])
  end
  return nil
end
EOD;

autolearn {
  spam_threshold = 6.0;
  ham_threshold = -0.5;
  check_balance = true;
  min_balance = 0.9;
}

Analysis:

The error Missing bytes in input comes from Redis's built-in cmsgpack.unpack() (lua-cmsgpack library). This means the msgpack-encoded token array that rspamd passes as a KEYS argument to the EVALSHA call is truncated or contains malformed msgpack data.

Since the error occurs on completely empty Redis (ruling out data corruption), is intermittent for the same worker/user/minute, is not correlated with message size, and affects both 3.12.1 and 3.14.3 script versions — the issue appears to be in how rspamd serializes the token array into msgpack before sending it to Redis. Possible causes:

  • Memory corruption or race condition in token packing
  • Interaction between LuaJIT 2.1 and Redis 7.x cmsgpack
  • Tokenizer producing certain token patterns that trigger a serialization edge case

Workaround: None known. Successfully learned tokens persist in Redis and work correctly on subsequent classify calls that don't hit the bug.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions