-
-
Notifications
You must be signed in to change notification settings - Fork 4.5k
fix message group id behavior in sqs fifo queues #8238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
79dd6a5 to
ad05a48
Compare
8a80530 to
f08d4ac
Compare
|
do you have a test for #7036 @dominikschubert ? |
Yep! Still needs to be adapted for the new client fixture though: #7037 |
baermat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great 🎉 and seems like it was quite the undertaking, I can feel brainpower radiating from the screen :)
I have a couple of suggestions, questions and nits, but nothing breaking, so we should be ready to go soon 🚀
| assert ordering in ( | ||
| ["g3-m1", "g1-m1", "g1-m2", "g1-m3", "g1-m4", "g2-m1"], | ||
| ["g3-m1", "g2-m1", "g1-m1", "g1-m2", "g1-m3", "g1-m4"], | ||
| ["g1-m1", "g1-m2", "g1-m3", "g1-m4", "g2-m1", "g3-m1"], | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit confused by this: Is there a specific reason why there is only a subset of the valid orderings?
Let's merge all group 1 messages into "g1" for now as they need to appear in order, then the valid orderings are
g1-g2-g3
g1-g3-g2
g2-g1-g3
g2-g3-g1
g3-g1-g2
g3-g2-g1
Are these 3 are really all we need, or was this simply omitted for simplicity? But then it implies we don't need the missing ones, hence my confusion.
Or I am missing something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm glad you pointed this out, I was a bit sloppy here. Instead of adding all orderings, I instead changed all the tests assert to be much more specific now. Some test cases were actually the same in AWS and LocalStack. There was just one case where I couldn't at all reproduce how AWS does the ordering of groups, and another where there's clearly a race and two orderings are possible. I also added comments whether the ordering is related to AWS or LocalStack. I think this is much clearer.
f08d4ac to
84837c3
Compare
Motivation
In SQS, FIFO queues behave very differently from standard queues when receiving messages. A feature we haven't implemented at all were message group IDs. While we were able to send and track them, we didn't actually do anything with them. But they greatly affect how receiving of messages work. Here's the gist from the FIFO delivery logic.
The issue was first raised in #6766, and requested by several Pro users.
Turns out that this was a non-trivial change to make because of a) the way our
SqsProvidercommunicates with theSqsQueue, and b) how messages are organized in theFifoQueue.Changes
Here's a summary of changes and their respective commits to make it easier to review:
SqsProviderandSqsQueue.The problem was that abstracting over a
getmethod that only returns a single message was blocking a way to retrieve multiple messages from the queue atomically, which is exactly what the FIFO behavior requires. So I replaced theSqsQueue.getmethod with areceivemethod that much more closely resembles whatReceiveMessageis doing, and allows you to get multiple messages at once. Now the access to the underlying datastructure holding the messages is hidden behindreceive.FifoQueue- completely new receive logic).Instead of storing individual messages in the queue, we now have a queue that orders
MessageGroupobjects that in turn contain a priority queue of messages. The implication is that we need more memory to store the messages. The worst case if each message has its own unique message group. The trade off is a simplified implementation of making sure that only one consumer receives a message group when there's a race for the queue. We first dequeue the message group (making it inaccessible to other consumers), and then get the messages from the group. Because we know that only one thread will access the messages in the group, it also vastly simplifies concurrent access. Thevisiblequeue is therefore now a construct that has moved completely into theStandardQueueFuture work
Expired receipt handles
I added a currently xfailed test that shows that you cannot delete a message with an expired receipt handle in FIFO queues. I was too lazy to implement this right now, since the PR is already quite large, and it needs some additional tinkering with the receipt handle encoder.
Message group ordering
It's not clear exactly how message groups themselves are ordered in the queue. There seems to be some sort of priority in AWS given to groups when they expire, but it behaved differently depending on whether messages were removed or expired. I couldn't figure out how AWS does it exactly, and it's not documented. The only mention I could find is a vague statement I interpret as "there is no particular guarantee in which order message groups will be received".
For this reason, I had to currently allow different orderings of message groups (note that the order within a group is guaranteed)
localstack/tests/integration/test_sqs.py
Lines 1573 to 1579 in 8a80530
More tests
There are a bunch more tests one could write. Including concurrent access, ordering behavior of groups with more groups to see how they are ordered after deletion etc.
Code duplication
There is a bit of code duplication especially in the
receiveimplementations. It seems hard to generalize, and maybe not worth it at all, given that we are not going to introduce other queue types. Maybe one could extract some util methods that do some of the work (e.g. collect messages from a queue or queue-like interface).Performance
There is a lot of locking going on in FIFO queues. Basically all access to message groups is currently completely linearized across consumers. This will affect performance mostly when there are many parallel consumers. But more systematic benchmarks are necessary, that can maybe be preformed as part of #7342
Fixes
FIFOqueue return messages from the sameMessageGroupId#6766