Support any number of Document Sequence Sections in CommandMessage#getCommandDocument by jyemin · Pull Request #1456 · mongodb/mongo-java-driver

jyemin · 2024-07-20T00:09:30Z

jyemin · 2024-07-20T00:33:00Z

+
+                while (outputByteBuf.hasRemaining()) {
+                    outputByteBuf.position(outputByteBuf.position() + 1 /* payload type */ + 4 /* payload size */);
+                    String payloadName = getPayloadName(outputByteBuf);


Now we get the payload name directly from the OP_MSG instead of from the SplittablePayload, making it easier to support any number of document sequence independent of any future changes to SplittablePayload

If we now get the "payload name" (which is actually, the sequence identifier) effectively from what have been previously written to the bsonOutput, why do we still use SplittablePayload.getPayloadName, including using it in CommandMessage?

I don't yet have a good understanding of what's going on, but the above seems quite confusing.

When writing the OP_MSG Kind 1 Section, we need to get the payload name from somewhere. It comes from SplittablePayload. This code is just getting it out of the OP_MSG after it's been written, so it no longer has to depend on SplittablePayload, which is now free to evolve independent of this method (for example, to support multiple payloads).

If you are suggesting that the getPayloadName method should be renamed to something like getSequenceIdentifier, I agree.

I think, I now understand what is so confusing here:

SplittablePayload.getPayloadName is only supposed to be used when encoding (so it's not surprising that the method is still used), and obviously should not be used when decoding. Its previous usage for decoding appears to having been an attempt to simplify the small piece of the decoding code at the cost of abusing SplittablePayload.getPayloadName and making the whole picture less clear.

It is also unclear if CommandMessage.getCommandDocument should be an instance method of CommandMessage (more about this below). If it were not an instance method of CommandMessage, then it would not have been so easy to access SplittablePayload.getPayloadName from CommandMessage.getCommandDocument, and SplittablePayload.getPayloadName likely would not have been used by CommandMessage.getCommandDocument to begin with.

CommandMessage.getCommandDocument requires its argument to be not just any ByteBufferBsonOutput with a command encoded in the right format, but the one that was filled by the CommandMessage.encode method called on the same instance of CommandMessage the last time CommandMessage.encode was called. I am not sure what code design principle this violates, but it definitely violates something, makes the code fragile and difficult to think about.

The only hard reason for CommandMessage.getCommandDocument to be an instance method of CommandMessage, is that it needs information on where in bsonOutput (its argument) to start decoding from.

CommandMessage remembers the document's position in RequestMessage.encodingMetadata when CommandMessage.encode is called (each encode overrides the previously remembered position), and CommandMessage.getCommandDocument uses that remembered position.

I wonder, why it was designed this way? An immediate "fix" that comes to mind, is to combine EncodingMetadata and the ByteBufferBsonOutput passed to CommandMessage.encode into something like DecodableByteBufferBsonOutput, and remove RequestMessage.encodingMetadata. Depending on the reasons behind the current design, there may be a reason why the proposed approach couldn't work.

This all goes a long way back, to when there were many more subclasses of RequestMessage (we used to have subclasses for OP_INSERT, OP_UPDATE, OP_DELETE, etc). But now, there is only CommandMessage, and except for the annoyance of having to use OP_QUERY for the very first message we send on a connection, CommandMessage only encodes OP_MSG. Which makes EncodingMetadata#getFirstDocumentPosition effectively a constant.

When we removed support for all the other op codes, we simplified a lot, but there is more that can be done. Just not sure it should be done in this PR.

Thank you, I now understand why it was varying previously, but not the reasons behind the design that I described previously. I created https://jira.mongodb.org/browse/JAVA-5554 about improving this.

…tCommandDocument JAVA-5536

stIncMale · 2024-07-25T13:41:42Z

+
+                while (outputByteBuf.hasRemaining()) {
+                    outputByteBuf.position(outputByteBuf.position() + 1 /* payload type */ + 4 /* payload size */);
+                    String payloadName = getPayloadName(outputByteBuf);


I think, I now understand what is so confusing here:

SplittablePayload.getPayloadName is only supposed to be used when encoding (so it's not surprising that the method is still used), and obviously should not be used when decoding. Its previous usage for decoding appears to having been an attempt to simplify the small piece of the decoding code at the cost of abusing SplittablePayload.getPayloadName and making the whole picture less clear.

It is also unclear if CommandMessage.getCommandDocument should be an instance method of CommandMessage (more about this below). If it were not an instance method of CommandMessage, then it would not have been so easy to access SplittablePayload.getPayloadName from CommandMessage.getCommandDocument, and SplittablePayload.getPayloadName likely would not have been used by CommandMessage.getCommandDocument to begin with.

CommandMessage.getCommandDocument requires its argument to be not just any ByteBufferBsonOutput with a command encoded in the right format, but the one that was filled by the CommandMessage.encode method called on the same instance of CommandMessage the last time CommandMessage.encode was called. I am not sure what code design principle this violates, but it definitely violates something, makes the code fragile and difficult to think about.

The only hard reason for CommandMessage.getCommandDocument to be an instance method of CommandMessage, is that it needs information on where in bsonOutput (its argument) to start decoding from.

CommandMessage remembers the document's position in RequestMessage.encodingMetadata when CommandMessage.encode is called (each encode overrides the previously remembered position), and CommandMessage.getCommandDocument uses that remembered position.

I wonder, why it was designed this way? An immediate "fix" that comes to mind, is to combine EncodingMetadata and the ByteBufferBsonOutput passed to CommandMessage.encode into something like DecodableByteBufferBsonOutput, and remove RequestMessage.encodingMetadata. Depending on the reasons behind the current design, there may be a reason why the proposed approach couldn't work.

jyemin self-assigned this Jul 20, 2024

jyemin commented Jul 20, 2024

View reviewed changes

jyemin marked this pull request as ready for review July 20, 2024 00:38

jyemin requested a review from stIncMale July 20, 2024 00:38

Support any number of Document Sequence Sections in CommandMessage#ge…

40e3744

…tCommandDocument JAVA-5536

jyemin force-pushed the JAVA-5536 branch from 3ebf49a to 40e3744 Compare July 20, 2024 00:49

jyemin added 2 commits July 20, 2024 13:06

Add inline comments

0921645

Style changes

7870aed

stIncMale requested changes Jul 25, 2024

View reviewed changes

jyemin added 3 commits July 25, 2024 20:09

Refactor createOne

c207fdf

Refactor sequence identifier creation.

49bf6f8

Fix horrible bug in CommandMessage#getCommandDocument

ec9b6bb

jyemin requested a review from stIncMale July 26, 2024 01:10

NathanQingyangXu reviewed Jul 26, 2024

View reviewed changes

Comment thread driver-core/src/main/com/mongodb/internal/connection/CommandMessage.java

checkstyle

74f7351

stIncMale approved these changes Jul 26, 2024

View reviewed changes

jyemin merged commit 14fc2fa into mongodb:master Jul 26, 2024

jyemin deleted the JAVA-5536 branch July 26, 2024 20:13

stIncMale mentioned this pull request Dec 6, 2024

Improved Bulk Write API #1509

Merged

Conversation

jyemin commented Jul 20, 2024

Uh oh!

Uh oh!

jyemin Jul 20, 2024

Choose a reason for hiding this comment

Uh oh!

stIncMale Jul 24, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jyemin Jul 24, 2024

Choose a reason for hiding this comment

Uh oh!

jyemin Jul 24, 2024

Choose a reason for hiding this comment

Uh oh!

stIncMale Jul 25, 2024

Choose a reason for hiding this comment

Uh oh!

jyemin Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

stIncMale Jul 26, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

stIncMale Jul 25, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

stIncMale Jul 24, 2024 •

edited

Loading