Skip to content

Sanitize dump payload: fix empty keys when RDB loading and restore command#9297

Merged
oranagra merged 9 commits intoredis:unstablefrom
sundb:ziplist-corrupt-dump
Aug 5, 2021
Merged

Sanitize dump payload: fix empty keys when RDB loading and restore command#9297
oranagra merged 9 commits intoredis:unstablefrom
sundb:ziplist-corrupt-dump

Conversation

@sundb
Copy link
Collaborator

@sundb sundb commented Jul 30, 2021

When we load rdb or restore command, if we encounter a length of 0, it will result in the creation of an empty key.
This could either be a corrupt payload, or a result of a bug (see #8453 )

This pr mainly fixes the following:

  1. When restore command will return Bad data format error.
  2. When loading RDB, we will silently discard the key.

@oranagra oranagra added the state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten label Jul 30, 2021
@oranagra
Copy link
Member

@sundb were these problems detected on unstable?
Started after the change in #9285?

@yossigo do you think failing loading of empty lists / zsets could be a problem? IIRC there was some bug in the past that created these, and this could cause upgrade failures.

@sundb
Copy link
Collaborator Author

sundb commented Jul 30, 2021

@oranagra They are detected on unstable after #9285.

@oranagra
Copy link
Member

oranagra commented Aug 1, 2021

@sundb i'm thinking that maybe we'll have to revert some of these checks in rdb.c.
as you can see in the corrupt payload: fuzzer findings - empty intset div by zero which you modified, i've added an assertion the ziplist random functions (just to avoid a signal that confuses these tests), and kept the test that executes SRANDMEMBER.

I suppose this batch of changes in rdb.c came up after you encountered an issue in the fuzzer that lead to adding the new empty quicklist test.
looking at your corrupt payload: fuzzer findings - empty quicklist test, i only see the restore but not any command that exposed an issue (which would be unreachable with your fix). do you remember what was the problem?

@yossigo
Copy link
Collaborator

yossigo commented Aug 1, 2021

@oranagra I remember there were Module API bugs that could result with that, and there were reports about that happening in the wild. Do you see harm in simply skipping those keys?

@oranagra
Copy link
Member

oranagra commented Aug 1, 2021

@yossigo i guess not... if we have to choose between failing the loading, storing an empty key in the db, or skipping it, i suppose skipping it is the better alternative. i.e. specifically because we had that bug.
should we do that only for zsets (IIRC that's where the bug was; need to confirm)

@oranagra
Copy link
Member

oranagra commented Aug 1, 2021

@sundb i looked at the recent failures in the daily CI, i see they're all about several lists commands.
some can't handle an empty list, and that generates an assertion:

  • popGenericCommand (t_list.c:466 'value != NULL' is not true)

and others reach ziplist sanitation errors:

  • __ziplistDelete (ziplist.c:876 'zipEntrySafe(zl, zlbytes, p, &tail, 1)' is not true)
  • __ziplistInsert (ziplist.c:683 'zipEntrySafe(zl, zlbytes, p, &e, 0)' is not true)

the reason the test suite failed is because when sanitize-dump-payload is yes it doesn't expect any crashes.
so maybe an easy solution for our problems is for rdb.c to just validate that lists are not empty when sanitize-dump-payload is enabled?
i.e. doing so would mean that this flag is not just about doing an O(n) complexity check (aka deep), but also about the guarantees of a successful RESTORE command, vs compatibility with old rdb files.

or maybe we should do that only for zsets (have that check depend on the deep mode), and in all other types do it unconditionally?
see #8453 for the bug that could have caused these.

@oranagra
Copy link
Member

oranagra commented Aug 1, 2021

@yossigo i think i'd like to propose the following:

  • on any datatype other than zset, we will fail rdb parsing and reject the restore command if the key is an empty one (regardless of the sanitization setting).
    i suppose that this won't backfire, since we aren't aware of any bug that could have created such keys.
  • on zsets, we silently delete them, without failing rdb parsing or restore command. and add a comment that we do that because of RM_ZsetRem: Delete key if empty #8453.
  • we avoid backporting this change to older version (only add it to 7.0), or if we backport it, we need to silently delete all types, not just zsets

@sundb
Copy link
Collaborator Author

sundb commented Aug 2, 2021

@sundb i looked at the recent failures in the daily CI, i see they're all about several lists commands.
some can't handle an empty list, and that generates an assertion:

  • popGenericCommand (t_list.c:466 'value != NULL' is not true)
The `quicklist ziplist tail followed by extra data which start with 0xff` test 
appears with this assertion.
restore key 0 "\x0E\xC0\x2B\x15\x00\x00\x00\x0A\x00\x00\x00\x01\x00\x00\xE0\x62\x58\xEA\xDF\x22\x00\x00\x00\xFF\x09\x00\xDF\x35\xD2\x67\xDC\x0E\x89\xAB" replace
lpop key

and others reach ziplist sanitation errors:

  • __ziplistDelete (ziplist.c:876 'zipEntrySafe(zl, zlbytes, p, &tail, 1)' is not true)
  • __ziplistInsert (ziplist.c:683 'zipEntrySafe(zl, zlbytes, p, &e, 0)' is not true)
The `empty quicklist` test appears with this assertion.
config set sanitize-dump-payload yes
debug set-skip-checksum-validation 1
restore key 0 "\x0E\x01\x11\x11\x00\x00\x00\x0A\x00\x00\x00\x01\x00\x00\xF6\xFF\xB0\x6C\x9C\xFF\x09\x00\x9C\x37\x47\x49\x4D\xDE\x94\xF5" replace
RPUSH key 1
RPOP key
RPOP key

the reason the test suite failed is because when sanitize-dump-payload is yes it doesn't expect any crashes.
so maybe an easy solution for our problems is for rdb.c to just validate that lists are not empty when sanitize-dump-payload is enabled?
i.e. doing so would mean that this flag is not just about doing an O(n) complexity check (aka deep), but also about the guarantees of a successful RESTORE command, vs compatibility with old rdb files.

or maybe we should do that only for zsets (have that check depend on the deep mode), and in all other types do it unconditionally?
see #8453 for the bug that could have caused these.

Yes, all the errors are due to list, and have been fixed in this pr, and I have 
run it for another 12,000 minutes without any new errors.

@yossigo
Copy link
Collaborator

yossigo commented Aug 2, 2021

@oranagra Can you think of a reason not to apply it to all types (and maybe keep a counter of such keys and report it in the log so it will not go unnoticed)?

I think it's a bit cleaner to make this more generic and not a workaround for a specific known bug (in the future we could learn of similar bugs affecting other types).

@oranagra
Copy link
Member

oranagra commented Aug 2, 2021

@yossigo ok, so you say that we should assume we may have had similar issues like the zset one on other types.
and you support the notion of silently ignoring these keys in rdb parsing rather than failing it.
but what about RESTORE command? it would be odd to reply with OK and have the key missing, and it could possibly cause issues for the fuzz tester too.
we can decide that on RESTORE we reject them with an error, and for rdb loading we delete them silently, but then if someone has an import tool that reads rdb files and generates RESTORE commands from them, his tool will get an error and possibly abort the operation.

@yossigo
Copy link
Collaborator

yossigo commented Aug 5, 2021

@oranagra Per our discussion, I think that last note makes sense.

Copy link
Member

@oranagra oranagra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sundb looks good with one minor suggestion.
i know we already have a test to cover that for RESTORE, it would be nice to add one to cover RDB file loading.

src/rdb.c Outdated
/* Don't fail when empty key is encountered, we will
* silently discard it and continue loading. */
if (error == RDB_LOAD_ERR_EMPTY_KEY)
continue;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we continue here, we're skipping this block:

        /* Reset the state that is key-specified and is populated by
         * opcodes before the key, so that we start from scratch again. */
        expiretime = -1;
        lfu_freq = -1;
        lru_idle = -1;

maybe a cleaner idea is to add a first if to the if-else chain below, so that all the other ifs will not be taken. then similarly to an already expired key, we continue with the normal flow of this loop.

also, let's mention #8453 in the comment to justify why we do that.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am building this rdb file.

@sundb sundb requested a review from oranagra August 5, 2021 13:03
@sundb sundb changed the title Sanitize dump payload: validate the end of ziplist and empty key Sanitize dump payload: validate empty keys Aug 5, 2021
@sundb sundb changed the title Sanitize dump payload: validate empty keys Sanitize dump payload: fix empty keys when RDB loading and restore command Aug 5, 2021
oranagra
oranagra previously approved these changes Aug 5, 2021
Co-authored-by: Oran Agra <[email protected]>
oranagra
oranagra previously approved these changes Aug 5, 2021
Copy link
Collaborator

@yossigo yossigo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one real comment (about the logging) and a minor style nit.

src/rdb.c Outdated
* in an RDB file, instead we will silently discard it and
* continue loading. */
if (error == RDB_LOAD_ERR_EMPTY_KEY) {
serverLog(LL_WARNING, "rdbLoadObject failed, detect empty key: %s", key);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's better to set a counter and maybe report the first occurrence and later the number of total such keys. A log message per key can be devastating for the log file...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It bothered me too, but I didn't want to lower the log level.
Yossi's idea seems best

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @oranagra, I'm later.

@oranagra oranagra added the release-notes indication that this issue needs to be mentioned in the release notes label Aug 5, 2021
@oranagra oranagra merged commit 8ea777a into redis:unstable Aug 5, 2021
oranagra added a commit that referenced this pull request Aug 5, 2021
Recently we found two issues in the fuzzer tester: #9302 #9285
After fixing them, more problems surfaced and this PR (as well as #9297) aims to fix them.

Here's a list of the fixes
- Prevent an overflow when allocating a dict hashtable
- Prevent OOM when attempting to allocate a huge string
- Prevent a few invalid accesses in listpack
- Improve sanitization of listpack first entry
- Validate integrity of stream consumer groups PEL
- Validate integrity of stream listpack entry IDs
- Validate ziplist tail followed by extra data which start with 0xff

Co-authored-by: sundb <[email protected]>
@sundb sundb deleted the ziplist-corrupt-dump branch August 6, 2021 01:32
sundb added a commit to sundb/redis that referenced this pull request Aug 6, 2021
1) Fix CR

2) In resolving conflict, I reverted all corrupt-test tests, as some
were fixed in redis#9302, and Re-add the new issue caused this pr.

3) ziplistValidateIntegrity fails to validate an empty ziplsit because
hash ziplist->listpack conversion is always deep santization, this
should be fixed in redis#9297 but was missed.
oranagra pushed a commit that referenced this pull request Aug 9, 2021
…d restore command (#9349)

This commit mainly fixes empty keys due to RDB loading and restore command,
which was omitted in #9297.

1) When loading quicklsit, if all the ziplists in the quicklist are empty, NULL will be returned.
    If only some of the ziplists are empty, then we will skip the empty ziplists silently.
2) When loading hash zipmap, if zipmap is empty, sanitization check will fail.
3) When loading hash ziplist, if ziplist is empty, NULL will be returned.
4) Add RDB loading test with sanitize.
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Sep 8, 2021
…mmand (redis#9297)

When we load rdb or restore command, if we encounter a length of 0, it will result in the creation of an empty key.
This could either be a corrupt payload, or a result of a bug (see redis#8453 )

This PR mainly fixes the following:
1) When restore command will return `Bad data format` error.
2) When loading RDB, we will silently discard the key.

Co-authored-by: Oran Agra <[email protected]>
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Sep 8, 2021
Recently we found two issues in the fuzzer tester: redis#9302 redis#9285
After fixing them, more problems surfaced and this PR (as well as redis#9297) aims to fix them.

Here's a list of the fixes
- Prevent an overflow when allocating a dict hashtable
- Prevent OOM when attempting to allocate a huge string
- Prevent a few invalid accesses in listpack
- Improve sanitization of listpack first entry
- Validate integrity of stream consumer groups PEL
- Validate integrity of stream listpack entry IDs
- Validate ziplist tail followed by extra data which start with 0xff

Co-authored-by: sundb <[email protected]>
JackieXie168 pushed a commit to JackieXie168/redis that referenced this pull request Sep 8, 2021
…d restore command (redis#9349)

This commit mainly fixes empty keys due to RDB loading and restore command,
which was omitted in redis#9297.

1) When loading quicklsit, if all the ziplists in the quicklist are empty, NULL will be returned.
    If only some of the ziplists are empty, then we will skip the empty ziplists silently.
2) When loading hash zipmap, if zipmap is empty, sanitization check will fail.
3) When loading hash ziplist, if ziplist is empty, NULL will be returned.
4) Add RDB loading test with sanitize.
@oranagra oranagra mentioned this pull request Oct 4, 2021
oranagra added a commit that referenced this pull request Oct 4, 2021
…mmand (#9297)

When we load rdb or restore command, if we encounter a length of 0, it will result in the creation of an empty key.
This could either be a corrupt payload, or a result of a bug (see #8453 )

This PR mainly fixes the following:
1) When restore command will return `Bad data format` error.
2) When loading RDB, we will silently discard the key.

Co-authored-by: Oran Agra <[email protected]>
(cherry picked from commit 8ea777a)
oranagra added a commit that referenced this pull request Oct 4, 2021
Recently we found two issues in the fuzzer tester: #9302 #9285
After fixing them, more problems surfaced and this PR (as well as #9297) aims to fix them.

Here's a list of the fixes
- Prevent an overflow when allocating a dict hashtable
- Prevent OOM when attempting to allocate a huge string
- Prevent a few invalid accesses in listpack
- Improve sanitization of listpack first entry
- Validate integrity of stream consumer groups PEL
- Validate integrity of stream listpack entry IDs
- Validate ziplist tail followed by extra data which start with 0xff

Co-authored-by: sundb <[email protected]>
(cherry picked from commit 0c90370)
oranagra pushed a commit that referenced this pull request Oct 4, 2021
…d restore command (#9349)

This commit mainly fixes empty keys due to RDB loading and restore command,
which was omitted in #9297.

1) When loading quicklsit, if all the ziplists in the quicklist are empty, NULL will be returned.
    If only some of the ziplists are empty, then we will skip the empty ziplists silently.
2) When loading hash zipmap, if zipmap is empty, sanitization check will fail.
3) When loading hash ziplist, if ziplist is empty, NULL will be returned.
4) Add RDB loading test with sanitize.

(cherry picked from commit cbda492)
oranagra pushed a commit that referenced this pull request Nov 20, 2022
#11519)

The following example will create an empty set (listpack encoding):
```
> RESTORE key 0
"\x14\x25\x25\x00\x00\x00\x00\x00\x02\x01\x82\x5F\x37\x03\x06\x01\x82\x5F\x35\x03\x82\x5F\x33\x03\x00\x01\x82\x5F\x31\x03\x82\x5F\x39\x03\x04\xA9\x08\x01\xFF\x0B\x00\xA3\x26\x49\xB4\x86\xB0\x0F\x41"
OK
> SCARD key
(integer) 0
> SRANDMEMBER key
Error: Server closed the connection
```

In the spirit of #9297, skip empty set when loading RDB_TYPE_SET_LISTPACK.
Introduced in #11290
madolson pushed a commit to madolson/redis that referenced this pull request Apr 19, 2023
redis#11519)

The following example will create an empty set (listpack encoding):
```
> RESTORE key 0
"\x14\x25\x25\x00\x00\x00\x00\x00\x02\x01\x82\x5F\x37\x03\x06\x01\x82\x5F\x35\x03\x82\x5F\x33\x03\x00\x01\x82\x5F\x31\x03\x82\x5F\x39\x03\x04\xA9\x08\x01\xFF\x0B\x00\xA3\x26\x49\xB4\x86\xB0\x0F\x41"
OK
> SCARD key
(integer) 0
> SRANDMEMBER key
Error: Server closed the connection
```

In the spirit of redis#9297, skip empty set when loading RDB_TYPE_SET_LISTPACK.
Introduced in redis#11290
enjoy-binbin added a commit to enjoy-binbin/redis that referenced this pull request Jul 31, 2023
redis#11519)

The following example will create an empty set (listpack encoding):
```
> RESTORE key 0
"\x14\x25\x25\x00\x00\x00\x00\x00\x02\x01\x82\x5F\x37\x03\x06\x01\x82\x5F\x35\x03\x82\x5F\x33\x03\x00\x01\x82\x5F\x31\x03\x82\x5F\x39\x03\x04\xA9\x08\x01\xFF\x0B\x00\xA3\x26\x49\xB4\x86\xB0\x0F\x41"
OK
> SCARD key
(integer) 0
> SRANDMEMBER key
Error: Server closed the connection
```

In the spirit of redis#9297, skip empty set when loading RDB_TYPE_SET_LISTPACK.
Introduced in redis#11290
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes state:to-be-merged The PR should be merged soon, even if not yet ready, this is used so that it won't be forgotten

Projects

Development

Successfully merging this pull request may close these issues.

3 participants