Skip to content

Fix Eval scripts defrag (broken 7.0 in RC1)#10271

Merged
oranagra merged 7 commits intoredis:unstablefrom
yoav-steinberg:fix_script_defrag
Feb 11, 2022
Merged

Fix Eval scripts defrag (broken 7.0 in RC1)#10271
oranagra merged 7 commits intoredis:unstablefrom
yoav-steinberg:fix_script_defrag

Conversation

@yoav-steinberg
Copy link
Contributor

@yoav-steinberg yoav-steinberg commented Feb 9, 2022

Fix scripts defragger since it was broken since #10126 (released in 7.0 RC1).
would crash the server if defragger starts in a server that contains eval scripts.

In #10126 the global lua_script dict became a dict to a custom luaScript struct with an internal robj in it instead of a generic sds -> robj dict.

In redis#10126 the global `lua_script` dict became a dict to a custom `luaScript` struct with an internal `robj` in it instead of a generic `sds` -> `robj` dict. This means we need custom code to defrag it and since scripts should never really cause much fragmentation it makes more sense to simply remove the defrag code for scripts.
@yoav-steinberg
Copy link
Contributor Author

My general comments about the active defrag in relation to this PR:

The basic idea behind defrag is that the allocations for the dataset might become very fragmented over time and at the same time the dataset is organized through a single dictionary. So it's easy to defrag it.
In my opinion not all dictionaries need to be defraged, do we defrag the Users Rax?

If we decide to defrag everything then at some point we'll probably run into the following:

  1. The code will become spaghetti because defrag.c will need to know the internals of everything.
  2. We'll have bugs because someone eventually will reference a pointer in some other place not thinking about defrag and then the pointer will get defragged and we'll access deallocated pointers.

We should try to keep defrag to dataset because it's by definition accessed only from a single location and tends to have more "public" interfaces for its data structures.

Alternatively we can simply fix the existing code in the following way:
make defrag.c aware of luaScript struct and then add another DEFRAG_SDS_DICT_VAL_LUA_SCRIPT and handle it in activeDefragSdsDict().

@oranagra
Copy link
Member

oranagra commented Feb 9, 2022

i'm not sure defrag is only about data set. it's about anything that can grow large and remain for long!

for instance, there's no point in defraging users, since they're not expected to grow much, and are also kinda static, and there's no point in defragging output buffers, since they're temporary.

However, Eval script remain for long, and in some cases can be unintentionally bloated.
I've seen cases with many thousands of scripts.
note that it's enough for one small allocation to hold a whole run hostage and prevent defragging.
i.e. maybe the dictEntry for that dict will do more damage than the script code itself.

Functions may be less of a problem, since i think they'll tend to be more static, and won't be unintentionally bloated.
however, by your definition, they ARE part of the data.

@oranagra
Copy link
Member

@oranagra oranagra changed the title Fix script defrag Fix Eval scripts defrag (broken 7.0 in RC1) Feb 11, 2022
@oranagra oranagra merged commit 2eb9b19 into redis:unstable Feb 11, 2022
@oranagra oranagra added the release-notes indication that this issue needs to be mentioned in the release notes label Feb 11, 2022
@oranagra oranagra mentioned this pull request Feb 28, 2022
arkamar added a commit to arkamar/gentoo that referenced this pull request Jul 12, 2022
This test was introduced in upstream PR [1], but it fails in some
situations, like when the src_test phase is run in a container. At least
this is where I was able to reproduce it.

[1] redis/redis#10271

Closes: https://bugs.gentoo.org/851654
Signed-off-by: Petr Vaněk <[email protected]>
gentoo-bot pushed a commit to gentoo/gentoo that referenced this pull request Jul 12, 2022
This test was introduced in upstream PR [1], but it fails in some
situations, like when the src_test phase is run in a container. At least
this is where I was able to reproduce it.

[1] redis/redis#10271

Closes: https://bugs.gentoo.org/851654
Signed-off-by: Petr Vaněk <[email protected]>
Signed-off-by: Sam James <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes indication that this issue needs to be mentioned in the release notes

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

2 participants