Modules: Add remaining list API functions#8439
Conversation
|
As listed on #8157. |
1a4d3a5 to
56d7a97
Compare
oranagra
left a comment
There was a problem hiding this comment.
This is a preliminary review.
I'm not sure i like the iterator API, something feels odd and i'm not sure i can put my finger on it.
I suppose that the fact that the iterator is actually embedded in the RedisModuleKey (rather than returning some opaque object) is ok, since in theory the user can open the same key twice. and if he modifies it during iteration that's wrong in any case that has two iterators open at the same time.
i see that zset API doesn't prevent the module from doing edits during the iterations, and also don't offer any iterator specific API for modifications (it just leaves this use case unsolved).
here, if we want to solve it, maybe we can go the extra mile and just make the normal modification APIs implicitly work during an iteration rather than provide a specific APIs for that?
i'm not sure what's the right thing, maybe i need to "sleep on it".
|
Thanks for the review.
👉 Functions are duplicated with iterator and non-iterator variants. Not so nice. Is that the main point?
Yes, this is modeled after zset and stream. Zset was first to set this standard. Good point though. Maybe it's worth documenting that you shouldn't open the key twice and do modifications on one while iterating on the other.
Good point. I can look into this (see what happens to the iterator if changes are made, maybe add a test case and think about improving the situation) later.
It can totally be done. Then we don't need the duplicated functions. We can add a special index value (a macro) for CURRENT, usable only while iterating.
Sleep well. Same here, after posting these comments. |
I've been thinking about this for a while and I think we need to find actual use cases to motivate what we should have. I have some ideas/problems:
Just an observation: Push/Pop is cheaper on the tail than on the head (also for RPUSH/RPOP vs LPUSH/LPOP) since ziplists are cheaper in the tail (no memmove). To avoid allocations, I've also been contemplating integer access to plain keys (RM_IntegerSet/Get e.g. for efficient implementation of things like INCRBY) and to ziplists (lists and hashes). I might draft it later. Avoiding allocations has great optimization potential in modules, or so I've heard. I'm sort of waiting for input about how to continue this PR. |
|
i don't have response for everything yet, but i wanna offer some perspective. I think the API needs to be easy to use, and flexible enough for that future changes in the internals of redis can still support it. |
Thanks. It's your decisions to make. Let me just give our perspective. Maybe we're not using modules as intended, but we've been using them to speed up things that could also be done in Lua. In one of our products, they hit 100% CPU load using Lua (while testing, not live). When they rewrote it to a module, it was much faster even using RM_Call(). We don't really know why. Maybe LuaJIT would make it faster. Moving a part of the application logic into the database itself is a great way of designing fast software. You save network traffic for starters. We've identified allocations (jemalloc's not-very-fast Given the already existing |
|
I think string DMA is a special case, letting modules store blobs of bytes, and also, that's an API which we'll probably always be able to support (no matter what changes we'll do internally in redis), and it is also quite easy to use for the user. i'm not sure the same can be said about ziplists and their integers. p.s. unsetting HAVE_MALLOC_SIZE, you also lose track of internal fragmentation, tracking its cost, or being able to take advantage of the spare memory in some places. |
|
StringDMA is actually an historic item in the API that dates back to the time module data types did not exist, it might even be a candidate for deprecation (if we decide to start doing that) IMHO. To work around costly short-lived allocations, perhaps we should consider using the stack more? I think this could be more feasible if RedisModuleString would not be bound only to robj+sds but could have a const reference to arbitrary buffers. |
|
that's an interesting idea, so like this could possibly still be an robj, and we have 60 bits for the length. |
|
int RM_ListIteratorNext(RedisModuleKey *key, long *index, RedisModuleString **elem);If RedisModuleString can be stack-allocated, that's awesome! I already drafted on the following non-allocating int RM_ListIteratorNextPtrOrLongLong(RedisModuleKey *key,
long *index,
const char **stringptr,
unsigned int *lenptr,
long long *intptr);
|
|
@oranagra Yes I guess it could be a new |
|
robj is 16 bytes (on 64bit systems), has encoding, refcount and lru which are completely useless for our cause. you may not like it, but here's my proposals:
i prefer option 2, considering RedisModuleString is an opaque type, the modules don't need normally need to care how me manage it. |
|
p.s. we can also let the module create such strings via some API, when it passes them to redis. e.g. when adding something from the module's own memory to the reply buffers, there's no reason for the module to copy it into a RedisModuleString just so that redis can copy it again into the reply buffers. one of these copies can be eliminated by letting the module create a static string. |
|
Let's move the static string discussion here to this issue: #8473. Feel free to change all the text. |
|
Regarding how to continue this API, here are some more concerns/observations/ideas:
If this design requires too much of you precious time, I will focus on something else for a while. |
List functions operating on elements by index: * RM_Get * RM_Set * RM_Insert * RM_Delete List iterator functions: * RM_ListIteratorStart * RM_ListIteratorStop * RM_ListIteratorNext * RM_ListIteratorInsert * RM_ListIteratorDelete
|
So, what we don't want is to have both iterator + index versions of insert/delete/set. I see two ways to continue this.
I think |
56d7a97 to
8f99ad5
Compare
The list iterator API functions are gone. Iterate using a simple for loop over indices. The functions use an internal iterator as an optimization.
8f99ad5 to
df4a1cf
Compare
|
PR is updated. See the top comment for the current API suggestion. |
bjosv
left a comment
There was a problem hiding this comment.
I think this look like a nice API, it got a lot simpler when hiding iterators.
oranagra
left a comment
There was a problem hiding this comment.
i like the API you proposed, it's a lot easier for the user to use.
on the other hand, it hides the (performance) implications from the user.
so a user that's doing a replace during an iteration, will have no idea that we invalidated the iterator and the next call is gonna be expensive. the user won't even notice that since the testing is likely to be on small lists.
the upside is that we can always later improve the implementation, and it won't affect the API or the modules that use it.
i'd like like to hear @guybe7 @MeirShpilraien @yossigo opinion about this (read this proposal)
|
@zuiderkwast nice PR! @oranagra i like the current API (hidden iterator) and i understand your concern. maybe we can document somewhere that modifying the list inside an iteration loop is possible but may cause a degradation in performance? |
itamarhaber
left a comment
There was a problem hiding this comment.
Looks good to me, especially the simple-to-use API.
madolson
left a comment
There was a problem hiding this comment.
(Just reviewed the APIs)
I like the APIs as well.
yossigo
left a comment
There was a problem hiding this comment.
I also like the API, LGTM (plus a typo and a doc suggestion).
Co-authored-by: Yossi Gottlieb <[email protected]>
|
@zuiderkwast thank you for pushing this through 🎉 |
List functions operating on elements by index:
List iterator functions: RM_ListIteratorStart, RM_ListIteratorStop, RM_ListIteratorNext, RM_ListIteratorInsert, RM_ListIteratorDeleteIteration is done using a simple for loop over indices. The index based functions use an internal iterator as an optimization. This is explained in the docs: