Sanitize dump payload: fix empty keys when RDB loading and restore command by sundb · Pull Request #9297 · redis/redis

sundb · 2021-07-30T12:05:05Z

When we load rdb or restore command, if we encounter a length of 0, it will result in the creation of an empty key.
This could either be a corrupt payload, or a result of a bug (see #8453 )

This pr mainly fixes the following:

When restore command will return Bad data format error.
When loading RDB, we will silently discard the key.

…which start with 0xff

oranagra · 2021-07-30T15:27:09Z

@sundb were these problems detected on unstable?
Started after the change in #9285?

@yossigo do you think failing loading of empty lists / zsets could be a problem? IIRC there was some bug in the past that created these, and this could cause upgrade failures.

sundb · 2021-07-30T15:42:44Z

@oranagra They are detected on unstable after #9285.

oranagra · 2021-08-01T06:52:37Z

@sundb i'm thinking that maybe we'll have to revert some of these checks in rdb.c.
as you can see in the corrupt payload: fuzzer findings - empty intset div by zero which you modified, i've added an assertion the ziplist random functions (just to avoid a signal that confuses these tests), and kept the test that executes SRANDMEMBER.

I suppose this batch of changes in rdb.c came up after you encountered an issue in the fuzzer that lead to adding the new empty quicklist test.
looking at your corrupt payload: fuzzer findings - empty quicklist test, i only see the restore but not any command that exposed an issue (which would be unreachable with your fix). do you remember what was the problem?

yossigo · 2021-08-01T10:57:51Z

@oranagra I remember there were Module API bugs that could result with that, and there were reports about that happening in the wild. Do you see harm in simply skipping those keys?

oranagra · 2021-08-01T11:02:24Z

@yossigo i guess not... if we have to choose between failing the loading, storing an empty key in the db, or skipping it, i suppose skipping it is the better alternative. i.e. specifically because we had that bug.
should we do that only for zsets (IIRC that's where the bug was; need to confirm)

oranagra · 2021-08-01T11:49:58Z

@sundb i looked at the recent failures in the daily CI, i see they're all about several lists commands.
some can't handle an empty list, and that generates an assertion:

popGenericCommand (t_list.c:466 'value != NULL' is not true)

and others reach ziplist sanitation errors:

__ziplistDelete (ziplist.c:876 'zipEntrySafe(zl, zlbytes, p, &tail, 1)' is not true)
__ziplistInsert (ziplist.c:683 'zipEntrySafe(zl, zlbytes, p, &e, 0)' is not true)

the reason the test suite failed is because when sanitize-dump-payload is yes it doesn't expect any crashes.
so maybe an easy solution for our problems is for rdb.c to just validate that lists are not empty when sanitize-dump-payload is enabled?
i.e. doing so would mean that this flag is not just about doing an O(n) complexity check (aka deep), but also about the guarantees of a successful RESTORE command, vs compatibility with old rdb files.

or maybe we should do that only for zsets (have that check depend on the deep mode), and in all other types do it unconditionally?
see #8453 for the bug that could have caused these.

oranagra · 2021-08-01T15:46:53Z

@yossigo i think i'd like to propose the following:

on any datatype other than zset, we will fail rdb parsing and reject the restore command if the key is an empty one (regardless of the sanitization setting).
i suppose that this won't backfire, since we aren't aware of any bug that could have created such keys.
on zsets, we silently delete them, without failing rdb parsing or restore command. and add a comment that we do that because of RM_ZsetRem: Delete key if empty #8453.
we avoid backporting this change to older version (only add it to 7.0), or if we backport it, we need to silently delete all types, not just zsets

sundb · 2021-08-02T02:42:43Z

@sundb i looked at the recent failures in the daily CI, i see they're all about several lists commands.
some can't handle an empty list, and that generates an assertion:

popGenericCommand (t_list.c:466 'value != NULL' is not true)

The `quicklist ziplist tail followed by extra data which start with 0xff` test 
appears with this assertion.

restore key 0 "\x0E\xC0\x2B\x15\x00\x00\x00\x0A\x00\x00\x00\x01\x00\x00\xE0\x62\x58\xEA\xDF\x22\x00\x00\x00\xFF\x09\x00\xDF\x35\xD2\x67\xDC\x0E\x89\xAB" replace
lpop key

and others reach ziplist sanitation errors:

__ziplistDelete (ziplist.c:876 'zipEntrySafe(zl, zlbytes, p, &tail, 1)' is not true)

__ziplistInsert (ziplist.c:683 'zipEntrySafe(zl, zlbytes, p, &e, 0)' is not true)

The `empty quicklist` test appears with this assertion.

config set sanitize-dump-payload yes
debug set-skip-checksum-validation 1
restore key 0 "\x0E\x01\x11\x11\x00\x00\x00\x0A\x00\x00\x00\x01\x00\x00\xF6\xFF\xB0\x6C\x9C\xFF\x09\x00\x9C\x37\x47\x49\x4D\xDE\x94\xF5" replace
RPUSH key 1
RPOP key
RPOP key

the reason the test suite failed is because when sanitize-dump-payload is yes it doesn't expect any crashes.
so maybe an easy solution for our problems is for rdb.c to just validate that lists are not empty when sanitize-dump-payload is enabled?
i.e. doing so would mean that this flag is not just about doing an O(n) complexity check (aka deep), but also about the guarantees of a successful RESTORE command, vs compatibility with old rdb files.

or maybe we should do that only for zsets (have that check depend on the deep mode), and in all other types do it unconditionally?
see #8453 for the bug that could have caused these.

Yes, all the errors are due to list, and have been fixed in this pr, and I have 
run it for another 12,000 minutes without any new errors.

yossigo · 2021-08-02T10:19:44Z

@oranagra Can you think of a reason not to apply it to all types (and maybe keep a counter of such keys and report it in the log so it will not go unnoticed)?

I think it's a bit cleaner to make this more generic and not a workaround for a specific known bug (in the future we could learn of similar bugs affecting other types).

oranagra · 2021-08-02T10:25:19Z

@yossigo ok, so you say that we should assume we may have had similar issues like the zset one on other types.
and you support the notion of silently ignoring these keys in rdb parsing rather than failing it.
but what about RESTORE command? it would be odd to reply with OK and have the key missing, and it could possibly cause issues for the fuzz tester too.
we can decide that on RESTORE we reject them with an error, and for rdb loading we delete them silently, but then if someone has an import tool that reads rdb files and generates RESTORE commands from them, his tool will get an error and possibly abort the operation.

…etZiplistValidateIntegrity

yossigo · 2021-08-05T09:25:38Z

@oranagra Per our discussion, I think that last note makes sense.

oranagra

@sundb looks good with one minor suggestion.
i know we already have a test to cover that for RESTORE, it would be nice to add one to cover RDB file loading.

oranagra · 2021-08-05T11:11:18Z

src/rdb.c

+            /* Don't fail when empty key is encountered, we will
+             * silently discard it and continue loading. */
+            if (error == RDB_LOAD_ERR_EMPTY_KEY)
+                continue;


when we continue here, we're skipping this block:

/* Reset the state that is key-specified and is populated by * opcodes before the key, so that we start from scratch again. */ expiretime = -1; lfu_freq = -1; lru_idle = -1;

maybe a cleaner idea is to add a first if to the if-else chain below, so that all the other ifs will not be taken. then similarly to an already expired key, we continue with the normal flow of this loop.

also, let's mention #8453 in the comment to justify why we do that.

Yes, I am building this rdb file.

src/rdb.c

Co-authored-by: Oran Agra <[email protected]>

yossigo

LGTM, one real comment (about the logging) and a minor style nit.

src/rdb.c

yossigo · 2021-08-05T15:20:29Z

src/rdb.c

+             * in an RDB file, instead we will silently discard it and
+             * continue loading. */
+            if (error == RDB_LOAD_ERR_EMPTY_KEY) {
+                serverLog(LL_WARNING, "rdbLoadObject failed, detect empty key: %s", key);


I think it's better to set a counter and maybe report the first occurrence and later the number of total such keys. A log message per key can be devastating for the log file...

It bothered me too, but I didn't want to lower the log level.
Yossi's idea seems best

Thanks @oranagra, I'm later.

Recently we found two issues in the fuzzer tester: #9302 #9285 After fixing them, more problems surfaced and this PR (as well as #9297) aims to fix them. Here's a list of the fixes - Prevent an overflow when allocating a dict hashtable - Prevent OOM when attempting to allocate a huge string - Prevent a few invalid accesses in listpack - Improve sanitization of listpack first entry - Validate integrity of stream consumer groups PEL - Validate integrity of stream listpack entry IDs - Validate ziplist tail followed by extra data which start with 0xff Co-authored-by: sundb <[email protected]>

1) Fix CR 2) In resolving conflict, I reverted all corrupt-test tests, as some were fixed in redis#9302, and Re-add the new issue caused this pr. 3) ziplistValidateIntegrity fails to validate an empty ziplsit because hash ziplist->listpack conversion is always deep santization, this should be fixed in redis#9297 but was missed.

…d restore command (#9349) This commit mainly fixes empty keys due to RDB loading and restore command, which was omitted in #9297. 1) When loading quicklsit, if all the ziplists in the quicklist are empty, NULL will be returned. If only some of the ziplists are empty, then we will skip the empty ziplists silently. 2) When loading hash zipmap, if zipmap is empty, sanitization check will fail. 3) When loading hash ziplist, if ziplist is empty, NULL will be returned. 4) Add RDB loading test with sanitize.

…mmand (redis#9297) When we load rdb or restore command, if we encounter a length of 0, it will result in the creation of an empty key. This could either be a corrupt payload, or a result of a bug (see redis#8453 ) This PR mainly fixes the following: 1) When restore command will return `Bad data format` error. 2) When loading RDB, we will silently discard the key. Co-authored-by: Oran Agra <[email protected]>

Recently we found two issues in the fuzzer tester: redis#9302 redis#9285 After fixing them, more problems surfaced and this PR (as well as redis#9297) aims to fix them. Here's a list of the fixes - Prevent an overflow when allocating a dict hashtable - Prevent OOM when attempting to allocate a huge string - Prevent a few invalid accesses in listpack - Improve sanitization of listpack first entry - Validate integrity of stream consumer groups PEL - Validate integrity of stream listpack entry IDs - Validate ziplist tail followed by extra data which start with 0xff Co-authored-by: sundb <[email protected]>

…d restore command (redis#9349) This commit mainly fixes empty keys due to RDB loading and restore command, which was omitted in redis#9297. 1) When loading quicklsit, if all the ziplists in the quicklist are empty, NULL will be returned. If only some of the ziplists are empty, then we will skip the empty ziplists silently. 2) When loading hash zipmap, if zipmap is empty, sanitization check will fail. 3) When loading hash ziplist, if ziplist is empty, NULL will be returned. 4) Add RDB loading test with sanitize.

…mmand (#9297) When we load rdb or restore command, if we encounter a length of 0, it will result in the creation of an empty key. This could either be a corrupt payload, or a result of a bug (see #8453 ) This PR mainly fixes the following: 1) When restore command will return `Bad data format` error. 2) When loading RDB, we will silently discard the key. Co-authored-by: Oran Agra <[email protected]> (cherry picked from commit 8ea777a)

Recently we found two issues in the fuzzer tester: #9302 #9285 After fixing them, more problems surfaced and this PR (as well as #9297) aims to fix them. Here's a list of the fixes - Prevent an overflow when allocating a dict hashtable - Prevent OOM when attempting to allocate a huge string - Prevent a few invalid accesses in listpack - Improve sanitization of listpack first entry - Validate integrity of stream consumer groups PEL - Validate integrity of stream listpack entry IDs - Validate ziplist tail followed by extra data which start with 0xff Co-authored-by: sundb <[email protected]> (cherry picked from commit 0c90370)

…d restore command (#9349) This commit mainly fixes empty keys due to RDB loading and restore command, which was omitted in #9297. 1) When loading quicklsit, if all the ziplists in the quicklist are empty, NULL will be returned. If only some of the ziplists are empty, then we will skip the empty ziplists silently. 2) When loading hash zipmap, if zipmap is empty, sanitization check will fail. 3) When loading hash ziplist, if ziplist is empty, NULL will be returned. 4) Add RDB loading test with sanitize. (cherry picked from commit cbda492)

#11519) The following example will create an empty set (listpack encoding): ``` > RESTORE key 0 "\x14\x25\x25\x00\x00\x00\x00\x00\x02\x01\x82\x5F\x37\x03\x06\x01\x82\x5F\x35\x03\x82\x5F\x33\x03\x00\x01\x82\x5F\x31\x03\x82\x5F\x39\x03\x04\xA9\x08\x01\xFF\x0B\x00\xA3\x26\x49\xB4\x86\xB0\x0F\x41" OK > SCARD key (integer) 0 > SRANDMEMBER key Error: Server closed the connection ``` In the spirit of #9297, skip empty set when loading RDB_TYPE_SET_LISTPACK. Introduced in #11290

redis#11519) The following example will create an empty set (listpack encoding): ``` > RESTORE key 0 "\x14\x25\x25\x00\x00\x00\x00\x00\x02\x01\x82\x5F\x37\x03\x06\x01\x82\x5F\x35\x03\x82\x5F\x33\x03\x00\x01\x82\x5F\x31\x03\x82\x5F\x39\x03\x04\xA9\x08\x01\xFF\x0B\x00\xA3\x26\x49\xB4\x86\xB0\x0F\x41" OK > SCARD key (integer) 0 > SRANDMEMBER key Error: Server closed the connection ``` In the spirit of redis#9297, skip empty set when loading RDB_TYPE_SET_LISTPACK. Introduced in redis#11290

sundb added 4 commits July 30, 2021 00:13

Sanitize dump payload: quicklist ziplist tail followed by extra data …

77f9fa2

…which start with 0xff

Sanitize dump payload: quicklist len is 0

6a065b0

Avoid create empty key due to corrupt payload

a96f478

Add r ping to corrupt-dump test

29f514b

oranagra added the state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten label Jul 30, 2021

Add empty ziplist check in rdb.c, hashZiplistValidateIntegrity and zs…

12b8fe0

…etZiplistValidateIntegrity

Silently discard empty keys when RDB loading

727d182

oranagra reviewed Aug 5, 2021

View reviewed changes

oranagra mentioned this pull request Aug 5, 2021

Improvements to corrupt payload sanitization #9321

Merged

Fix CR and remove santization for ziplist

68f5217

sundb requested a review from oranagra August 5, 2021 13:03

sundb changed the title ~~Sanitize dump payload: validate the end of ziplist and empty key~~ Sanitize dump payload: validate empty keys Aug 5, 2021

sundb changed the title ~~Sanitize dump payload: validate empty keys~~ Sanitize dump payload: fix empty keys when RDB loading and restore command Aug 5, 2021

oranagra previously approved these changes Aug 5, 2021

View reviewed changes

src/rdb.c Outdated Show resolved Hide resolved

Update src/rdb.c

744695b

Co-authored-by: Oran Agra <[email protected]>

sundb dismissed oranagra’s stale review via 744695b August 5, 2021 13:55

oranagra previously approved these changes Aug 5, 2021

View reviewed changes

yossigo reviewed Aug 5, 2021

View reviewed changes

make sure not to print too many log lines on skipped keys

b8e0c10

oranagra dismissed their stale review via b8e0c10 August 5, 2021 19:22

oranagra approved these changes Aug 5, 2021

View reviewed changes

oranagra added the release-notes indication that this issue needs to be mentioned in the release notes label Aug 5, 2021

oranagra merged commit 8ea777a into redis:unstable Aug 5, 2021

sundb deleted the ziplist-corrupt-dump branch August 6, 2021 01:32

oranagra mentioned this pull request Aug 8, 2021

Replace all usage of ziplist with listpack for t_hash #8887

Merged

sundb mentioned this pull request Aug 9, 2021

Sanitize dump payload: handle remaining empty key when RDB loading and restore command #9349

Merged

oranagra mentioned this pull request Aug 16, 2021

Add LMPOP/BLMPOP commands. #9373

Merged

oranagra mentioned this pull request Sep 20, 2021

[CRASH] Redis server 5.0.5 crashed while memory was hovering around maxmemory limit #9518

Open

oranagra mentioned this pull request Oct 4, 2021

Release 6.2.6 #9583

Merged

oranagra mentioned this pull request Nov 20, 2022

sanitize dump payload: fix crash with empty set with listpack encoding #11519

Merged

Conversation

sundb commented Jul 30, 2021 • edited by oranagra Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

oranagra commented Jul 30, 2021

Uh oh!

sundb commented Jul 30, 2021

Uh oh!

oranagra commented Aug 1, 2021

Uh oh!

yossigo commented Aug 1, 2021

Uh oh!

oranagra commented Aug 1, 2021

Uh oh!

oranagra commented Aug 1, 2021

Uh oh!

oranagra commented Aug 1, 2021

Uh oh!

sundb commented Aug 2, 2021

Uh oh!

yossigo commented Aug 2, 2021

Uh oh!

oranagra commented Aug 2, 2021

Uh oh!

yossigo commented Aug 5, 2021

Uh oh!

oranagra left a comment

Choose a reason for hiding this comment

Uh oh!

oranagra Aug 5, 2021

Choose a reason for hiding this comment

Uh oh!

sundb Aug 5, 2021

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yossigo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

yossigo Aug 5, 2021

Choose a reason for hiding this comment

Uh oh!

oranagra Aug 5, 2021

Choose a reason for hiding this comment

Uh oh!

sundb Aug 6, 2021

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sundb commented Jul 30, 2021 •

edited by oranagra

Loading