Modules: Add remaining list API functions by zuiderkwast · Pull Request #8439 · redis/redis

zuiderkwast · 2021-02-02T12:07:46Z

List functions operating on elements by index:

RM_ListGet
RM_ListSet
RM_ListInsert
RM_ListDelete

~~List iterator functions: RM_ListIteratorStart, RM_ListIteratorStop, RM_ListIteratorNext, RM_ListIteratorInsert, RM_ListIteratorDelete~~

Iteration is done using a simple for loop over indices. The index based functions use an internal iterator as an optimization. This is explained in the docs:

 * Many of the list functions access elements by index. Since a list is in
 * essence a doubly-linked list, accessing elements by index is generally an
 * O(N) operation. However, if elements are accessed sequentially or with
 * indices close together, the functions are optimized to seek the index from
 * the previous index, rather than seeking from the ends of the list.
 *
 * This enables iteration to be done efficiently using a simple for loop:
 *
 *     long n = RM_ValueLength(key);
 *     for (long i = 0; i < n; i++) {
 *         RedisModuleString *elem = RedisModule_ListGet(key, i);
 *         // Do stuff...
 *     }

zuiderkwast · 2021-02-02T12:10:05Z

As listed on #8157. RM_ListLen is not needed since RM_ValueLength gives the length of a list.

oranagra

This is a preliminary review.
I'm not sure i like the iterator API, something feels odd and i'm not sure i can put my finger on it.

I suppose that the fact that the iterator is actually embedded in the RedisModuleKey (rather than returning some opaque object) is ok, since in theory the user can open the same key twice. and if he modifies it during iteration that's wrong in any case that has two iterators open at the same time.

i see that zset API doesn't prevent the module from doing edits during the iterations, and also don't offer any iterator specific API for modifications (it just leaves this use case unsolved).

here, if we want to solve it, maybe we can go the extra mile and just make the normal modification APIs implicitly work during an iteration rather than provide a specific APIs for that?

i'm not sure what's the right thing, maybe i need to "sleep on it".

src/module.c

zuiderkwast · 2021-02-04T23:07:44Z

Thanks for the review.

I'm not sure i like the iterator API, something feels odd and i'm not sure i can put my finger on it.

👉 Functions are duplicated with iterator and non-iterator variants. Not so nice. Is that the main point?

I suppose that the fact that the iterator is actually embedded in the RedisModuleKey (rather than returning some opaque object) is ok, since in theory the user can open the same key twice. and if he modifies it during iteration that's wrong in any case that has two iterators open at the same time.

Yes, this is modeled after zset and stream. Zset was first to set this standard. Good point though. Maybe it's worth documenting that you shouldn't open the key twice and do modifications on one while iterating on the other.

i see that zset API doesn't prevent the module from doing edits during the iterations, and also don't offer any iterator specific API for modifications (it just leaves this use case unsolved).

Good point. I can look into this (see what happens to the iterator if changes are made, maybe add a test case and think about improving the situation) later.

here, if we want to solve it, maybe we can go the extra mile and just make the normal modification APIs implicitly work during an iteration rather than provide a specific APIs for that?

It can totally be done. Then we don't need the duplicated functions. We can add a special index value (a macro) for CURRENT, usable only while iterating.

i'm not sure what's the right thing, maybe i need to "sleep on it".

Sleep well. Same here, after posting these comments.

src/module.c

zuiderkwast · 2021-02-08T11:25:23Z

I'm not sure i like the iterator API, something feels odd and i'm not sure i can put my finger on it.

I've been thinking about this for a while and I think we need to find actual use cases to motivate what we should have. I have some ideas/problems:

Next() allocates a string. This makes typical use cases very expensive, such as searching for a particular element for returning or removing it. It's actually worse than using RM_Call() with LPOS and LREM. We could add a RM_ListFind() function, or better IMO is to have a function which exposes the pointer, length or interger value (if string pointer is NULL), just like a ziplist entry.
Removing and inserting may elements one-by-one is slightly expensive, since ziplists use memmove() on these operations. Perhaps we should work with ranges here, e.g. DeleteRange(key, start, count), or inserting multiple elements using a format string, accepting integers, C-strings and RedisModuleStrings as va_args?
Even if the main use case is stuff like queues (push and pop) like for doubly-linked lists, it can also be used for traversal and access in the middle since there is no other type for these. Seeking is a bit like a skiplist since it's really a linked list of ziplists.
We could make the iterator completely hidden, only providing access by index and automatically using an interator behind the scenes when users access index 0 (or -1), hoping they will access 1 (or -2) next. Maybe it gives the wrong illusions but the optimization could be documented in that case...

Just an observation: Push/Pop is cheaper on the tail than on the head (also for RPUSH/RPOP vs LPUSH/LPOP) since ziplists are cheaper in the tail (no memmove).

To avoid allocations, I've also been contemplating integer access to plain keys (RM_IntegerSet/Get e.g. for efficient implementation of things like INCRBY) and to ziplists (lists and hashes). I might draft it later. Avoiding allocations has great optimization potential in modules, or so I've heard.

I'm sort of waiting for input about how to continue this PR.

oranagra · 2021-02-08T11:40:37Z

i don't have response for everything yet, but i wanna offer some perspective.
i feel that we don't want to make the API too coupled with the internals, or too hard to use.
i.e. i think it would be a bad idea to expose any char* to internal memory, and integer values.

I think the API needs to be easy to use, and flexible enough for that future changes in the internals of redis can still support it.
if that's not efficient enough for some module, it can always implement it's own data type, rather than using redis's ones.

zuiderkwast · 2021-02-08T14:20:28Z

i don't have response for everything yet, but i wanna offer some perspective.
i feel that we don't want to make the API too coupled with the internals, or too hard to use.
i.e. i think it would be a bad idea to expose any char* to internal memory, and integer values.

I think the API needs to be easy to use, and flexible enough for that future changes in the internals of redis can still support it.
if that's not efficient enough for some module, it can always implement it's own data type, rather than using redis's ones.

Thanks. It's your decisions to make. Let me just give our perspective.

Maybe we're not using modules as intended, but we've been using them to speed up things that could also be done in Lua. In one of our products, they hit 100% CPU load using Lua (while testing, not live). When they rewrote it to a module, it was much faster even using RM_Call(). We don't really know why. Maybe LuaJIT would make it faster.

Moving a part of the application logic into the database itself is a great way of designing fast software. You save network traffic for starters.

We've identified allocations (jemalloc's not-very-fast malloc_usable_size(), which we have a patch for disabling and instead storing the usable size inside the allocation, something Redis already can do if HAVE_MALLOC_SIZE is undefined, trading one word of memory for CPU), RM_Call()'s allocations and the fact that you need to allocate strings for most things, as particularly expensive.

Given the already existing RedisModule_StringDMA which provides direct read/write access to internal memory, I thought returning a const char * or a long long wouldn't be that problematic, but it's easy for me to say---your're the maintainers who will suffer forever once things are merged. :)

oranagra · 2021-02-08T14:38:37Z

I think string DMA is a special case, letting modules store blobs of bytes, and also, that's an API which we'll probably always be able to support (no matter what changes we'll do internally in redis), and it is also quite easy to use for the user. i'm not sure the same can be said about ziplists and their integers.
maybe if you draft an API declaration, we'll be able to look at it and come to some new understandings.
or maybe we need to figure out some primitive access API that is able to abstract these, and still be useful and efficient.

p.s. unsetting HAVE_MALLOC_SIZE, you also lose track of internal fragmentation, tracking its cost, or being able to take advantage of the spare memory in some places.

yossigo · 2021-02-08T15:41:55Z

StringDMA is actually an historic item in the API that dates back to the time module data types did not exist, it might even be a candidate for deprecation (if we decide to start doing that) IMHO.

To work around costly short-lived allocations, perhaps we should consider using the stack more? I think this could be more feasible if RedisModuleString would not be bound only to robj+sds but could have a const reference to arbitrary buffers.

oranagra · 2021-02-08T15:56:07Z

that's an interesting idea, so like OBJ_STATIC_REFCOUNT, you won't be able to RM_RetainString these, but you'll be able to RM_StringPtrLen and RM_HoldString them (creating a new heap based string).
and they'll possibly vanish when you exit the scope or do some operation that destroys their underlying data.
is that what you mean?

this could possibly still be an robj, and we have 60 bits for the length.
or we can make a revolution and stop using robj.

zuiderkwast · 2021-02-08T15:59:25Z

RM_ListIteratorDelete() and RM_ListIteratorInsert() can be deleted. Their non-iterator versions can function even if there is an iterator. The iterator can be re-seeked behind the scenes on the next Next() when necessary. The non-iterator function need index, so Next() can return the index also:

int RM_ListIteratorNext(RedisModuleKey *key, long *index, RedisModuleString **elem);

If RedisModuleString can be stack-allocated, that's awesome!

I already drafted on the following non-allocating Next() (before the idea of stack strings):

int RM_ListIteratorNextPtrOrLongLong(RedisModuleKey *key,
                                     long *index,
                                     const char **stringptr,
                                     unsigned int *lenptr,
                                     long long *intptr);

Fetches the next element in a list, either as a char pointer or as an integer
value. If you prefer to fetch the element as a RedisModuleString*, use
RM_ListIteratorNext() instead.

key -- A key opened for reading.

index -- The 0-based index of the found element is set here. Use NULL
if you don't want to receive the index of the element.

stringptr -- A pointer to a string where the the result is returned when
the element cannot be represented as a long long. May be set
to NULL if you're only interested in integer elements.

lenptr -- The length of the string returned at *stringptr.

intptr -- if the element contains an integer value, *intptr is set to
the value and *stringptr is set to NULL.

On success, *index is set and REDISMODULE_OK is returned.

If the element can be represented as a long long, it is set at *intptr, and
*stringptr is set to NULL (if stringptr is provided).

If the element cannot be represented as a long long, *stringptr is set to
the characters of the element and *stringlen is set to its length. Note that
this string is not nul-terminated and must not be modified.

If stringptr is not provided (i.e. it is NULL) and the element cannot be
represented as a long long, REDISMODULE_ERR is returned and errno is set to
ERANGE.

yossigo · 2021-02-08T16:31:51Z

@oranagra Yes I guess it could be a new robj type, the real culprit here is sds. I think that would be the best of all worlds - no copies, no breaking API changes, and no exposing of internals.

oranagra · 2021-02-08T20:30:00Z

robj is 16 bytes (on 64bit systems), has encoding, refcount and lru which are completely useless for our cause.

you may not like it, but here's my proposals:

create a new type of for the type field: #define OBJ_STRING_PTR 15, and use the encoding, refcount and lru fields bits for the length (60 bits), and the ptr as the data.
RedisModuleString is always a heap allocation, so we can set the LSB of the pointer we return to the module to 1 when the string is RedisModuleStaticString allocation rather than an robj, in which case we cast it to that type, a struct which has size_t len, and void* ptr (also 16 bytes on 64bit systems)
if we don't like using the LSB of the pointer we hand to the user, we can reserve one bit in the robj's type member instead, and store lengths of up to 63 bits (note that we can't use a bit of the ptr member, since these are not always heap allocation pointers).

i prefer option 2, considering RedisModuleString is an opaque type, the modules don't need normally need to care how me manage it.
although they will need to know that some operations are impossible with some strings, like they already have (not being able to RM_RetainString OBJ_STATIC_REFCOUNT

oranagra · 2021-02-08T20:33:16Z

p.s. we can also let the module create such strings via some API, when it passes them to redis. e.g. when adding something from the module's own memory to the reply buffers, there's no reason for the module to copy it into a RedisModuleString just so that redis can copy it again into the reply buffers. one of these copies can be eliminated by letting the module create a static string.

zuiderkwast · 2021-02-08T20:39:39Z

Let's move the static string discussion here to this issue: #8473. Feel free to change all the text.

zuiderkwast · 2021-02-16T11:11:03Z

Regarding how to continue this API, here are some more concerns/observations/ideas:

Insert and Delete by index is impossible to replicate using a regular command. Perhaps write APIs should mimic the write commands for this reason. The commands LREM, LSET and LINSERT operate by occurence, not by index.
Should the iterator mimic the ZsetRange, e.g. Start/Stop, First/Last, Next/Prev? (Changing direction is not yet possible in the quicklist iterator, but it's easy to implement.)
FindNext/FindPrev (return index or seek the iterator, maybe both?)

If this design requires too much of you precious time, I will focus on something else for a while.

List functions operating on elements by index: * RM_Get * RM_Set * RM_Insert * RM_Delete List iterator functions: * RM_ListIteratorStart * RM_ListIteratorStop * RM_ListIteratorNext * RM_ListIteratorInsert * RM_ListIteratorDelete

zuiderkwast · 2021-08-30T14:22:31Z

So, what we don't want is to have both iterator + index versions of insert/delete/set. I see two ways to continue this.

Let the iterator return the current element and index (e.g. we have next/prev/currentElement/currentIndex) and let the write functions (insert/delete/set) operate on index. We can optimize the insert/delete/set call to the current index, e.g. ListInsert(ListIteratorCurrentIndex(key), newelement), by reusing the current iterator to avoid seeking using a fresh iterator, when index is matching. This optimization can be documented so the user can rely on it.

API functions, alt. 1

/* list iterator */
int RM_ListIteratorStart(RedisModuleKey key); /* OK or ERR */
int RM_ListIteratorStop(RedisModuleKey key); /* OK or ERR */
int RM_ListIteratorFirst(RedisModuleKey key); /* OK or ERR */
int RM_ListIteratorLast(RedisModuleKey key); /* OK or ERR */
int RM_ListIteratorNext(RedisModuleKey key); /* OK or ERR */
int RM_ListIteratorPrev(RedisModuleKey key); /* OK or ERR */
RedisModuleString *RM_ListIteratorCurrentElement(RedisModuleKey key);
long RM_ListIteratorCurrentIndex(RedisModuleKey key);
/* other list functions */
RedisModuleString *RM_ListGet(RedisModuleKey key, long index);
int *RM_ListSet(RedisModuleKey key, long index, RedisModuleString *value);
int *RM_ListInsert(RedisModuleKey key, int before_or_after, long index, RedisModuleString *value);
int *RM_ListDelete(RedisModuleKey key, long index);

To take this one step further, we can even hide the iterator alltogether and just optimize get/set/insert/delete (all by index) so that when index+1 is accessed, the iterator is stepped. No Iterator functions are added to the API. You'd just iterate using for(;;):

int n = RedisModule_ValueLength(key);
for (int i = 0; i < n; i++) {
    RedisModuleString *elem = RedisModule_ListGet(key, i);
    RedisModule_ListSet(key, i, new_value);
    ...
}

API functions, alt. 2

/* list functions */
RedisModuleString *RM_ListGet(RedisModuleKey key, long index);
int *RM_ListSet(RedisModuleKey key, long index, RedisModuleString *value);
int *RM_ListInsert(RedisModuleKey key, int before_or_after, long index, RedisModuleString *value);
int *RM_ListDelete(RedisModuleKey key, long index);

I think #2 presents a very minimal API and allows a single iterator to be used under the hood. The implementation get simpler than #1 since the internal iterator can be used by all functions with less special cases. @yossigo or @oranagra WDYT?

The list iterator API functions are gone. Iterate using a simple for loop over indices. The functions use an internal iterator as an optimization.

zuiderkwast · 2021-08-31T12:24:32Z

PR is updated. See the top comment for the current API suggestion.

bjosv

I think this look like a nice API, it got a lot simpler when hiding iterators.

src/module.c

oranagra

i like the API you proposed, it's a lot easier for the user to use.
on the other hand, it hides the (performance) implications from the user.
so a user that's doing a replace during an iteration, will have no idea that we invalidated the iterator and the next call is gonna be expensive. the user won't even notice that since the testing is likely to be on small lists.
the upside is that we can always later improve the implementation, and it won't affect the API or the modules that use it.

i'd like like to hear @guybe7 @MeirShpilraien @yossigo opinion about this (read this proposal)

src/module.c

src/redismodule.h

guybe7 · 2021-09-02T12:21:55Z

@zuiderkwast nice PR!

@oranagra i like the current API (hidden iterator) and i understand your concern. maybe we can document somewhere that modifying the list inside an iteration loop is possible but may cause a degradation in performance?

oranagra · 2021-09-02T18:31:59Z

@redis/core-team please approve.
note the considerations specified here and here, and how it looks when compared to the zset iteration and modification APIs.

itamarhaber

Looks good to me, especially the simple-to-use API.

madolson

(Just reviewed the APIs)
I like the APIs as well.

yossigo

I also like the API, LGTM (plus a typo and a doc suggestion).

src/module.c

Co-authored-by: Yossi Gottlieb <[email protected]>

oranagra · 2021-09-14T14:50:04Z

@zuiderkwast thank you for pushing this through 🎉

zuiderkwast force-pushed the modules-api-list branch from 1a4d3a5 to 56d7a97 Compare February 2, 2021 12:17

oranagra reviewed Feb 4, 2021

View reviewed changes

src/module.c Outdated Show resolved Hide resolved

src/module.c Outdated Show resolved Hide resolved

src/module.c Outdated Show resolved Hide resolved

src/module.c Outdated Show resolved Hide resolved

yossigo reviewed Feb 7, 2021

View reviewed changes

src/module.c Outdated Show resolved Hide resolved

src/module.c Outdated Show resolved Hide resolved

src/module.c Outdated Show resolved Hide resolved

oranagra mentioned this pull request Jul 26, 2021

[META] Module API V1 #8157

Open

52 tasks

zuiderkwast added the state:needs-design the solution is not obvious and some effort should be made to design it label Aug 20, 2021

Modules: Add remaining list API functions

72bffa5

List functions operating on elements by index: * RM_Get * RM_Set * RM_Insert * RM_Delete List iterator functions: * RM_ListIteratorStart * RM_ListIteratorStop * RM_ListIteratorNext * RM_ListIteratorInsert * RM_ListIteratorDelete

zuiderkwast force-pushed the modules-api-list branch from 56d7a97 to 8f99ad5 Compare August 31, 2021 12:14

Rewrite list API to use implicit iterator.

df4a1cf

The list iterator API functions are gone. Iterate using a simple for loop over indices. The functions use an internal iterator as an optimization.

zuiderkwast force-pushed the modules-api-list branch from 8f99ad5 to df4a1cf Compare August 31, 2021 12:16

zuiderkwast requested review from oranagra and yossigo August 31, 2021 12:23

bjosv reviewed Aug 31, 2021

View reviewed changes

src/module.c Show resolved Hide resolved

src/module.c Outdated Show resolved Hide resolved

src/module.c Outdated Show resolved Hide resolved

src/module.c Outdated Show resolved Hide resolved

oranagra reviewed Sep 2, 2021

View reviewed changes

src/module.c Show resolved Hide resolved

src/module.c Show resolved Hide resolved

src/module.c Outdated Show resolved Hide resolved

src/redismodule.h Outdated Show resolved Hide resolved

zuiderkwast added 2 commits September 2, 2021 16:52

Fix review comments

33b8176

Add note that modifications invalidate the internal iterator

2762380

oranagra approved these changes Sep 2, 2021

View reviewed changes

itamarhaber reviewed Sep 9, 2021

View reviewed changes

madolson approved these changes Sep 9, 2021

View reviewed changes

yossigo approved these changes Sep 12, 2021

View reviewed changes

src/module.c Outdated Show resolved Hide resolved

src/module.c Show resolved Hide resolved

zuiderkwast and others added 2 commits September 12, 2021 15:32

Update src/module.c

03a4761

Co-authored-by: Yossi Gottlieb <[email protected]>

Add warning about RM_Call (it will confuse the iterator)

2f71c55

oranagra approved these changes Sep 14, 2021

View reviewed changes

oranagra merged commit ea36d4d into redis:unstable Sep 14, 2021

zuiderkwast deleted the modules-api-list branch September 14, 2021 17:26

Conversation

zuiderkwast commented Feb 2, 2021 • edited by oranagra Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zuiderkwast commented Feb 2, 2021

Uh oh!

oranagra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zuiderkwast commented Feb 4, 2021

Uh oh!

Uh oh!

Uh oh!

Uh oh!

zuiderkwast commented Feb 8, 2021

Uh oh!

oranagra commented Feb 8, 2021

Uh oh!

zuiderkwast commented Feb 8, 2021

Uh oh!

oranagra commented Feb 8, 2021

Uh oh!

yossigo commented Feb 8, 2021

Uh oh!

oranagra commented Feb 8, 2021

Uh oh!

zuiderkwast commented Feb 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yossigo commented Feb 8, 2021

Uh oh!

oranagra commented Feb 8, 2021

Uh oh!

oranagra commented Feb 8, 2021

Uh oh!

zuiderkwast commented Feb 8, 2021

Uh oh!

zuiderkwast commented Feb 16, 2021

Uh oh!

zuiderkwast commented Aug 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zuiderkwast commented Aug 31, 2021

Uh oh!

bjosv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

oranagra left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

guybe7 commented Sep 2, 2021

Uh oh!

oranagra commented Sep 2, 2021

Uh oh!

itamarhaber left a comment

Choose a reason for hiding this comment

Uh oh!

madolson left a comment

Choose a reason for hiding this comment

Uh oh!

yossigo left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

oranagra commented Sep 14, 2021

Uh oh!

Reviewers

zuiderkwast commented Feb 2, 2021 •

edited by oranagra

Loading

zuiderkwast commented Feb 8, 2021 •

edited

Loading

zuiderkwast commented Aug 30, 2021 •

edited

Loading