sync sessions with blockstore by dirkmc · Pull Request #398 · ipfs/go-bitswap

dirkmc · 2020-05-20T18:04:12Z

There are some issues with the existing message flow through the code as outlined in ipfs/boxo#92. The flow is complex and there are some synchonization issues.

Existing Message Flow

In the existing message flow, the Session

subscribes to keys with a Pubsub implementation, that in turn relies on an external library
registers the keys with the SessionInterestManager

When a message is received by Bitswap, it

informs Pubsub, which informs the relevant Sessions
informs the SessionManager which asks the SessionInterestManager which sessions are interested in the message, then passes the message to the interested Sessions
stores the blocks in the Blockstore

When the Session receives a block, it informs the SessionManager which checks the SessionInterestManager to see if there are any other interested sessions, and if not tells the PeerManager to cancel the want.

New Message Flow

In the new message flow, the Session creates a WantRequest to manage a set of keys with the WantRequestManager. The WantRequestManager checks the Blockstore to see if the block is already there.

When a message is received by Bitswap, it informs the WantRequestManager, which

informs the relevant WantRequest which informs the Session
stores the blocks in the Blockstore

The PeerManager keeps track of which sessions have pending wants (by session ID).
When the Session receives a block, it sends a cancel for the session ID and want to the PeerManager.

Stebalien

This is going to need more docs and better function names before I can review the design.

Let's get rid of On* methods/functions where possible. That doesn't say what the method does and lets us just stick arbitrary logic in it.
Docs, lots of docs. Especially docs talking about the purpose and goal of types/functions. This is a great test to see if the function. Even if we have TODOs, I need to know what the code is trying to do.
Avoid mixing patterns.
- Don't mix callbacks with channels and event loops. The channel event loop pattern is designed to run specific logic on specific goroutines. If we start throwing callbacks around, these callbacks can get called anywhere.
- If we're using channel and event loop pattern, all state updates need to go through the event loop.

internal/notifications/notifications.go

internal/session/session.go

…nto refactor/notif

dirkmc · 2020-06-09T14:21:12Z

bitswap.go

-// GetBlock attempts to retrieve a particular block from peers within the
-// deadline enforced by the context.
-func (bs *Bitswap) GetBlock(parent context.Context, k cid.Cid) (blocks.Block, error) {
-	return bsgetter.SyncGetBlock(parent, k, bs.GetBlocks)


Removed bsgetter - its functionality now lives inside the session

dirkmc · 2020-06-09T14:21:55Z

bitswap.go


-	// Put wanted blocks into blockstore
-	if len(wanted) > 0 {
-		err := bs.blockstore.PutMany(wanted)


blockstore operations now happen inside of the WantRequestManager so it can synchronize access

dirkmc · 2020-06-09T14:23:08Z

bitswap.go


-	// Send all block keys (including duplicates) to any sessions that want them.
-	// (The duplicates are needed by sessions for accounting purposes)
-	bs.sm.ReceiveFrom(ctx, from, allKs, haves, dontHaves)


Messages are now only published to the WantRequestManager, instead of sending blocks to the Notifier and messages to the SessionManager

dirkmc · 2020-06-09T14:25:03Z

internal/peermanager/peermanager.go

+	// Free up block presence tracking for keys that no session is interested
+	// in anymore
+	unwanted := pm.pwm.unwanted(cancelKs)
+	pm.bpm.RemoveKeys(unwanted)


The PeerManager now keeps track of which sessions are interested in which wants.
So I moved the check for whether any sessions are interested in a key (and thus when it's safe to send a cancel) from the SessionManager / SessionInterestManager to here in the PeerManager.

dirkmc · 2020-06-09T14:25:50Z

internal/peermanager/peerwantmanager.go

+	sessions map[uint64]struct{}
+	tp       wantType
+}
+


In the peerWantManager we now keep track of which sessions are interested in each want

dirkmc · 2020-06-09T14:30:44Z

internal/session/session.go

 	consecutiveTicks    int
 	initialSearchDelay  time.Duration
 	periodicSearchDelay delay.D
+	wantRequests        map[*bswrm.WantRequest]struct{}


The session now keeps track of individual WantRequests (each call to GetBlocks() creates a new WantRequest)

dirkmc · 2020-06-09T14:31:39Z

internal/session/session.go

-	dontHaves = interestedRes[2]
-	s.logReceiveFrom(from, ks, haves, dontHaves)
+// GetBlock fetches a single block.
+func (s *Session) GetBlock(reqctx context.Context, k cid.Cid) (blocks.Block, error) {


The code in GetBlock() was moved from getter.go into the Session. Seems like a more natural place for it to live.

dirkmc · 2020-06-09T14:33:47Z

internal/session/sessionwantsender.go

 )

-// SessionWantsCanceller provides a method to cancel wants
-type SessionWantsCanceller interface {


We no longer cancel wants with the SessionManager - instead the sessions tell the PeerManager when they're no longer interested in a want and it decides whether to cancel the want

dirkmc · 2020-06-09T14:34:39Z

internal/sessionmanager/sessionmanager.go

-
-	// Cancel keys that no session is interested in anymore
-	sm.cancelWants(cancelKs)
-


The PeerManager now takes care of cancelling wants

dirkmc · 2020-06-09T14:35:03Z

internal/sessionmanager/sessionmanager.go

 }
-
-// ReceiveFrom is called when a new message is received
-func (sm *SessionManager) ReceiveFrom(ctx context.Context, p peer.ID, blks []cid.Cid, haves []cid.Cid, dontHaves []cid.Cid) {


The WantRequestManager now takes care of receiving incoming messages and distributing them to sessions

…ocks

Stebalien · 2020-06-10T06:25:08Z

I haven't read it in-depth, but I like the general structure. I'll give it a more in-depth review once we finally ship a release.

dirkmc · 2020-06-11T21:12:15Z

internal/peermanager/peerwantmanager.go

-// sent
-func (pwm *peerWantManager) sendCancels(cancelKs []cid.Cid) {
+// sent. It will only send a cancel for keys that no session wants anymore.
+func (pwm *peerWantManager) sendCancels(sid uint64, cancelKs []cid.Cid) {


With these changes PeerWantManager actually ends up performing better than master on BenchmarkPeerManager:

$ go test ./internal/peermanager -run xyz -v -bench . -benchtime 5s goos: darwin goarch: amd64 pkg: github.com/ipfs/go-bitswap/internal/peermanager BenchmarkPeerManager BenchmarkPeerManager-8 203724 29745 ns/op

master

$ go test ./internal/peermanager -run xyz -v -bench . -benchtime 5s goos: darwin goarch: amd64 pkg: github.com/ipfs/go-bitswap/internal/peermanager BenchmarkPeerManager BenchmarkPeerManager-8 169858 35684 ns/op

…cancel

bitswap.go

dirkmc · 2020-06-12T15:23:05Z

internal/session/session.go

+	var wr *bswrm.WantRequest
+	var err error
+	wr, err = s.wrm.NewWantRequest(keys, func(ks []cid.Cid) {
+		s.incoming <- op{


This cancel function is called when the request is cancelled or the session is shutdown, so it will block the WantRequest.Run() go-routine which is about to exit. When the session shuts down, it drains the incoming channel, so this channel send shouldn't ever block forever. I'll add a comment to clarify.

…s are interestd in

dirkmc · 2020-06-12T21:51:12Z

I ran the benchmarks manually and there seems to be no significant change in performance:

Left side: This branch / Right side: master

Stebalien

Basic pass

internal/wantrequestmanager/wantrequestmanager.go

Stebalien · 2020-06-12T22:29:46Z

internal/wantrequestmanager/wantrequestmanager.go

+
+		// Check if the block is wanted by one of the sessions
+		for wr := range wrm.wrs[c] {
+			if wr.wants(c) {


Why do we check this? Why not just if len(wrm.wrs[c]) > 0? I.e., "there exists a want request for the CID"?

Apparently, yes. What about having two maps?

We could have 2 maps but that would mean when a want changes state from wanted -> unwanted we have to update both maps, so I believe it would be less performant

Updating the wants might be slightly more expensive, but we'd be able to get rid of this loop. An extra map operation per block received is better than iterating over the entire wantlist.

Ah so this code is actually iterating over all the WantRequests who have registered interest in the want (there will almost always be just one).

To clarify:

func (wrm *WantRequestManager) wantedBlocks(blks []blocks.Block) []blocks.Block { wrm.lk.Lock() defer wrm.lk.Unlock() wanted := make([]blocks.Block, 0, len(blks)) for _, b := range blks { c := b.Cid() // Check if the block is wanted by one of the sessions for wr := range wrm.wrs[c] { if wr.wants(c) { wanted = append(wanted, b) break } } } return wanted }

wrm.wrs is a mapping from CID -> <set of WantRequests>, so

for wr := range wrm.wrs[c] {

iterates over <set of WantRequests>

internal/wantrequestmanager/wantrequestmanager.go

Stebalien · 2020-06-12T22:32:17Z

internal/wantrequestmanager/wantrequestmanager.go

+
+	// Publish the message to WantRequests that are interested in the
+	// blocks / HAVEs / DONT_HAVEs in the message
+	return wrm.publish(msg), nil


This will keep blocks in memory for longer. May not be an issue.

What about blocks we're not even interested in?

This is also returning the wrong thing. Don't we want to return the blocks we want, not the blocks we're simply interested in?

That is what it returns - added some comments to make this clearer

internal/wantrequestmanager/wantrequestmanager.go

Stebalien · 2020-06-12T22:49:12Z

internal/wantrequestmanager/wantrequestmanager.go

+}
+
+//
+// The WantRequestManager keeps track of WantRequests.


Can we be a bit more explicit about the purpose of this service? That'll help us reason about where functionality should live.

How about:

// // The WantRequestManager keeps track of which sessions want which blocks, // distributes incoming messages to those sessions, and writes blocks to // the blockstore. // When the client calls Session.WantBlocks(keys), the session creates a // WantRequest with those keys on the WantRequestManager. // When a message arrives Bitswap calls PublishToSessions and the // WantRequestManager writes the blocks to the blockstore and informs all // Sessions that are interested in the message. //

Awesome! I wonder if this should be called something like the Router, BlockRouter, or ResponseRouter, or something like that. WantRequestManager makes me think it manages wants.

But this description makes it clear what this service is supposed to do.

Agree that the naming is confusing, let's think about how to make names better as we work out the other details of the PR.

Stebalien · 2020-06-12T22:53:56Z

bitswap.go

-		for _, b := range notWanted {
-			log.Debugf("[recv] block not in wantlist; cid=%s, peer=%s", b.Cid(), from)
-		}
+	wantedKs, err := bs.wrm.PublishToSessions(&bswrm.IncomingMessage{


It would be nice to not have to return state from this. We'd be able to make this async. We should be able to move providing and bitswap engine updates inside, maybe?

Hm. Nevermind. That doesn't really change anything.

Stebalien · 2020-06-12T22:56:59Z

internal/wantrequestmanager/wantrequestmanager.go

+		if !local {
+			// If the blocks came from the network, only put blocks to the
+			// blockstore if the local node actually wanted them
+			wanted = wrm.wantedBlocks(msg.Blks)


Hm. This is racy, isn't it? I guess it's racy already.

The steps are:

figure out which blocks are wanted

write wanted blocks to the blockstore (may take some time)

publish message to sessions

figure out which blocks are wanted now

set wants to "received" state

send to sessions

We need to unlock the WantRequestManager for step 2, which is why we need to figure out which blocks are wanted twice. It's possible for the same block to occasionally be written twice to the blockstore, although I wouldn't expect that to happen very often and it's not a problem in terms of correctness (only in terms of performance). Does the blockstore detect when the same block is written twice?

Writing twice should be fine, it just might be slightly slower.

Stebalien · 2020-06-12T23:04:18Z

internal/session/session.go

-	for _, c := range interestedKs {
-		log.Debugw("Bitswap <- block", "local", s.self, "from", from, "cid", c, "session", s.id)
+	// Use a WantRequest to listen for incoming messages pertaining to the keys
+	wr, err := s.wrm.NewWantRequest(keys)


How will this work when we switch to streaming requests?

Currently WantRequest.Run() keeps track of which keys have been received, and closes the outgoing channel of blocks when all keys have been received.
I imagine that if we have a channel of incoming keys (instead of a slice) WantRequest.Run() will instead check if the channel is closed, and if so, close the outgoing channel of blocks.

Stebalien · 2020-06-12T23:07:23Z

internal/wantrequestmanager/wantrequestmanager.go

+
+	// Send the message and the set of cids of wanted blocks
+	select {
+	case wr.messages <- &messageWanted{fmsg, wanted}:


If there's any way to avoid this event loop, I'd do that. We're mixing locked state with event loops.

Looking at this, I'm pretty sure we don't need this event loop at all.

You're right it would be better not to mix locking with event loops.

We need locking so that we can return which blocks were wanted from PublishToSessions()

I believe we need an event loop to

send blocks on the outgoing channel (returned from Session.GetBlocks())

listen for requestCtx.Done() and sessCtx.Done()

In future we'll probably also need it to read from the incoming channel of keys.

Any way we can avoid that?

The outgoing channel is completely buffered, right? Can't we just write into it directly?

Ah, ok, if we're streaming wants, this will no longer apply.

Stebalien · 2020-06-12T23:10:33Z

internal/session/session.go

+
+// drainCancels receives on the incoming channel until a cancel request for
+// each WantRequest has been received and passed to the sessionWantSender
+func (s *Session) drainCancels() {


Is there really no better way?

Open to suggestions :)

Not really...

Stebalien · 2020-06-12T23:33:32Z

So, the message diagram is missing at least one message: blocks from the WantRequest to the user. Can we decouple that? Would that make things simpler?

That is:

User asks session for blocks.
Session tells the want request manager it's interested in the blocks.
Session starts fetching the blocks.
Eventually, the want request manager sends an incoming message to the session that includes the block.
The Session sends the blocks back to the user.

This will ensure that blocks take a more predictable path:

out: { { users } -> sessions } -> session peer manager -> { peer queues }
                        \
                         v
in:  { message } -> want request manager -> { sessions -> { users } }

Stebalien · 2020-06-12T23:38:57Z

What happens if I call GetBlocks with the same CIDs multiple times concurrently on the same session?

dirkmc · 2020-06-15T16:19:39Z

the want request manager sends an incoming message to the session that includes the block

the Session sends the blocks back to the user.

I agree that it would be nice to have a more linear flow like that. The catch is that the Session would need to be aware of which WantRequests are interested in which keys. We could add that, although it seems a little redundant to Route from WantRequest -> Session -> WantRequest.

We could also have change from a WantRequestManager that knows about which WantRequests are interested in each key to

a SessionWantManager that knows about which Sessions are interested in which keys
Sessions that know which WantRequests are interested in which keys

Again it's a little redundant.

dirkmc · 2020-06-15T16:20:29Z

What happens if I call GetBlocks with the same CIDs multiple times concurrently on the same session?

I think that will cause problems (I think that's probably always been an issue)

Stebalien · 2020-06-17T02:58:22Z

I agree that it would be nice to have a more linear flow like that. The catch is that the Session would need to be aware of which WantRequests are interested in which keys. We could add that, although it seems a little redundant to Route from WantRequest -> Session -> WantRequest.

I'm suggesting a different flow:

The user asks the user asks the session for a block (refcounted in the session in case the user asks multiple times).
The session asks tells the wantrequestmanager that it's interested in wants (refcounted per session).
The session tells the session peer manager to start sending wants to peers.
When bitswap receives a block, it informs the wantrequestmanager.
The wantrequestmanager informs the sessions.
The sessions figure out which users want the blocks, and route them.

Effectively, this gets rid of wantrequests. I'm not sure if this is worth implementing in this patch, but it would definitely fix #398 (comment).

I've posted some architecture thoughts in ipfs/boxo#90 to sum up how I see this flow working. Unfortunately, I know I'm missing quite a few details.

dirkmc · 2020-06-17T14:59:31Z

The sessions figure out which users want the blocks, and route them

I think this is the issue - we still need to have a mapping from the CID back to the channel for the user who wants the block (ie the WantRequest).

We also need to cancel wants when the request context is cancelled, which means we need a mapping from user context to wants (the WantRequest).

Stebalien · 2021-09-15T14:27:57Z

@dirkmc I can't remember where we left this, I just remember that you needed to hop to the lotus team. Should we try to land it?

dirkmc · 2021-09-15T14:56:04Z

To be honest I think it's probably better to drop this PR and start again at the design stage - it was more complicated than I anticipated and I think it would be better to debate it at the design level before trying to implement it in code

Stebalien · 2021-09-15T15:15:53Z

Yeah, SGTM.

refactor: sync sessions with blockstore

5955da2

dirkmc requested a review from Stebalien May 20, 2020 18:04

dirkmc added 4 commits May 20, 2020 15:59

fix: ensure listen for incoming only after adding wants

97d41e2

fix: order of adding wants / listening for incoming messages

45d24c5

refactor: simplify notification management

43771d3

wip: add TODO for notifier

66e4bce

Stebalien reviewed May 21, 2020

View reviewed changes

dirkmc added 17 commits May 27, 2020 14:25

refactor: sync sessions with blockstore

1336786

fix: ensure listen for incoming only after adding wants

1cbbe36

fix: order of adding wants / listening for incoming messages

9863239

refactor: simplify notification management

35417e4

wip: add TODO for notifier

2384d4e

Merge branch 'refactor/notif' of https://github.com/ipfs/go-bitswap i…

c3bac1e

…nto refactor/notif

wip: notifications refactor

ee3f2b8

refactor: move incoming msg listener to WantRequest

d32881e

refactor: peer session wants interface

6c94de2

fix: free up block presence tracking when key is cancelled

583e34a

docs: better session docs

e03a114

refactor: rename notifications to wantrequestmanager

0ef6918

refactor: remove unneeded contexts

8436cfe

refactor: clean up unneeded contexts

2dac8fa

test: WantRequestManager tests

966461c

refactor: remove getter

1631cce

docs: WantRequestManager docs

9b0a688

dirkmc commented Jun 9, 2020

View reviewed changes

dirkmc added 5 commits June 9, 2020 11:54

fix: drain cancels in sessionWantSender

c67dda5

fix: ensure outgoing blocks chan has enough capacity that it never bl…

1aa96ce

…ocks

test: get session tests working

f35353f

test: fix sessionWantSender tests

74770ae

test: fix SessionManager test

9866221

dirkmc added 4 commits June 11, 2020 11:40

merge: from master

58b0b5c

merge: from master

3f62870

refactor: want gauge calculation

f57fc26

perf: optimize send cancels

481fa14

dirkmc commented Jun 11, 2020

View reviewed changes

dirkmc added 2 commits June 12, 2020 12:09

fix: ensure want request cancelled even if all keys received

3560b8d

test: add test to ensure BlockPresenceManager keys are cleaned up on …

7aa4b7c

…cancel

dirkmc commented Jun 12, 2020

View reviewed changes

dirkmc marked this pull request as ready for review June 12, 2020 16:33

dirkmc requested a review from Stebalien June 12, 2020 16:33

dirkmc added 2 commits June 12, 2020 17:25

fix: prevent deadlock on incoming messages

67b32c6

fix: make sure BlockPresenceManager only registers wants that session…

d1cfea6

…s are interestd in

Stebalien reviewed Jun 12, 2020

View reviewed changes

refactor: clearer docs and names in WantRequestManager

38586bf

Stebalien mentioned this pull request Jan 27, 2023

[ipfs/go-bitswap] Bitswap architecture thoughts ipfs/boxo#90

Open

dirkmc added 2 commits June 17, 2020 11:13

refactor: WantPending -> BlockWanted

3f76606

refactor: PublishToSessions -> ReceiveMessage

426690a

Stebalien closed this Sep 15, 2021


		// Cancel keys that no session is interested in anymore
		sm.cancelWants(cancelKs)

Conversation

dirkmc commented May 20, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Existing Message Flow

New Message Flow

Uh oh!

Stebalien left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Stebalien commented Jun 10, 2020

Uh oh!

dirkmc Jun 11, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

master

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dirkmc commented Jun 12, 2020

Uh oh!

Stebalien left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Stebalien Jun 12, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

dirkmc commented May 20, 2020 •

edited

Loading

dirkmc Jun 11, 2020 •

edited

Loading

Stebalien Jun 12, 2020 •

edited

Loading