[ETCM-275] Async node processing and downloader removal by KonradStaniec · Pull Request #759 · input-output-hk/mantis

KonradStaniec · 2020-10-26T15:48:24Z

Description

Changes how nodes are processed and removes StateDownloader.

Speed of processing of each batch of nodes is highly volatile i.e at the beginning of sync it is quite fast and at the end it is quite slow, to keep the balance between number of responses queued to process in the presence of volatile number of peers, it is necessary to mark peer as active through whole mpt node request life cycle i.e

ask for nodes
hash and verify delivered nodes
insert valid nodes into the running trie scheduler

Only after this whole cycle peer is marked as free to handle another request. That way we can achieve optimal throupout through whole state sync. It also guarantees that our depth first descent won't goes too much in breadth.

Such design makes separate StateDownloader not necessary and even troublesome as it requires to sync up state between two actors.

Testing

I had already synced it to Mainnet few times.

…cessing

mmrozek · 2020-11-03T13:29:40Z

+            if (underlyingMessage.maxHeaders == 1) {
+              // pivot block
+              sender ! MessageFromPeer(BlockHeaders(Seq(pivotHeader)), peer)
+              this


Minor: You don't mutate state, so you could move this after the if statement. It will simplify a logic a little bit

mmrozek · 2020-11-03T13:45:14Z

+  }
+  import akka.pattern.pipe
+
+  // scalastyle:off


mmrozek · 2020-11-03T13:53:01Z

+  case class RequestFailed(from: Peer, reason: String) extends RequestResult
+
+  sealed trait ProcessingError
+  case class Critical(er: CriticalError) extends ProcessingError


What is the difference between Critical and DownloaderError with critical = true?

Criticial stop the sync entierly as the trie is malformed for some reason, and DownloaderError wihth critical only blacklist peer. I will change the naming here as it can be confusing

kapke

First batch of comment, more will come tomorrow

kapke · 2020-11-03T16:06:33Z

+
+    case PeerRequestHandler.RequestFailed(peer, reason) =>
+      context unwatch (sender())
+      log.debug(s"Request failed to peer {} due to {}", peer.id, reason)


Minor: "Request to peer {} failed due to {}" sounds a bit better IMO

kapke · 2020-11-03T16:24:34Z

+  ): Receive = handleCommonMessages orElse handleRequestResults orElse {
+    case Sync if currentState.numberOfPendingRequests > 0 && restartRequested.isEmpty =>
+      val freePeers = getFreePeers(currentDownloaderState)
+      nodesToProcess.dequeueOption match {


minor: since freePeers is always used with emptiness check this could be changed into:
(nodesToProcess.dequeueOption, NonEmptyList.fromList(freePeers)) match { //rest of code
Nice bonus of that approach:

no need to use fromListUnsafe

exhaustiveness checks will work (using if in case disabled exhaustiveness checker for given match expression)

great suggestion!

kapke · 2020-11-03T16:27:43Z

+          requests.foreach(req => requestNodes(req))
+          processNodes(newState, currentStats, newDownloaderState, nodes).pipeTo(self)
+          context.become(
+            syncing(


that syncing handler tracks quite a bit of state now, maybe it makes sense to extract it to some class?

yep it will probably make code clearer

kapke · 2020-11-03T16:35:47Z

+      currentStats: ProcessingStatistics,
+      currentDownloaderState: DownloaderState,
+      requestResult: RequestResult
+  ): Future[ProcessingResult] = {


Does it make sense then to write this function without Future and wrap into Future at call site?

kapke · 2020-11-03T16:37:13Z

+
+  case class UsefulData(responses: List[SyncResponse]) extends ResponseProcessingResult
+
+  final case class DownloaderState(


Should it go to a separate file now? This one got quite big already.

kapke · 2020-11-03T16:40:45Z

      ) {
        override def mptStateSavedKeys(): Observable[Either[IterationError, ByteString]] = {
-          Observable.repeatEvalF(Task(Right(ByteString(1)))).takeWhile(_ => !loadingFinished)
+          Observable.repeat(Right(ByteString(1))).takeWhile(_ => !loadingFinished)


Why not Observable.interval(1.ms).map(_ => Right(ByteString(1)).takeWhile(_ => !loadingFinished)?

I'm not very familiar with monix's internals, but I can imagine that just repeat gives not much time for other stuff on processing thread.

…cessing

kapke · 2020-11-04T10:25:26Z

-  def idle(processingStatistics: ProcessingStatistics): Receive = {
+  def idle(processingStatistics: ProcessingStatistics): Receive = handleCommonMessages orElse {
    case StartSyncingTo(root, bn) =>
      val state1 = startSyncing(root, bn)


startSyncing method is always followed by SyncSchedulerActorState.initial. Maybe that call could be part of startSyncing method?

kapke · 2020-11-04T10:32:22Z

+            // TODO we should probably start sync again from new target block, as current trie is malformed or declare
+            // fast sync as failure and start normal sync from scratch
+            context.stop(self)
+          case DownloaderError(newDownloaderState, peer, description, critical) =>


s/critical/blacklistable?

kapke · 2020-11-04T10:37:35Z

+        onlyPivot: Boolean = false,
+        failedNodeRequest: Boolean = false
+    ): Unit = {
+      val sender = TestProbe()


Why not making it a method on autopilot? I can see convenience argument, but also in this way it would be easy to send a message to probe, which doesn't have autpilot installed

Move DownloaderState to separate file Extract Actor state to separate class Call Future.apply at call site Improve synccontrollerspec autopilot

…cessing

kapke

LGTM! If needed - I can try to sync to mainnet over the weekend to test.

kapke · 2020-11-05T11:10:38Z

+    restartRequester ! WaitingForNewTargetBlock
+    context.become(idle(currentStats.addSaved(currentState.memBatch.size)))
+  }
+  import akka.pattern.pipe


very minor - I'd prefer to have this import either locally within method or on top of the file

mmrozek

LGTM!

KonradStaniec · 2020-11-05T12:32:26Z

@kapke any additional syncing testing is appreciated (for now, I have tested it Locally and and on EC2 machine)

KonradStaniec added 3 commits October 26, 2020 16:45

[ETCM-275] Process all responses async

7854928

Merge remote-tracking branch 'origin/develop' into etcm-275/async-pro…

c510ef3

…cessing

[ETCM-275] Improve handling of restart signal

b2180b3

KonradStaniec force-pushed the etcm-275/async-processing branch from 5ace25f to b2180b3 Compare October 27, 2020 14:49

[ETCM-275] Remove state downloader

632ba50

KonradStaniec force-pushed the etcm-275/async-processing branch from d3f4e56 to 632ba50 Compare October 30, 2020 06:17

KonradStaniec added 2 commits October 30, 2020 07:22

Merge remote-tracking branch 'origin/develop' into etcm-275/async-pro…

cdf03ed

…cessing

[ETCM-275] Fix conflicts after merge

668d219

KonradStaniec changed the title ~~Etcm 275/async processing~~ [ETCM-275] Async node processing and downloader removal Oct 30, 2020

KonradStaniec requested review from kapke and mmrozek October 30, 2020 08:22

KonradStaniec marked this pull request as ready for review October 30, 2020 08:23

[ETCM-275] Improve synccontroller tests

9151899

KonradStaniec force-pushed the etcm-275/async-processing branch from dbb28a8 to 9151899 Compare October 30, 2020 09:15

[ETCM-275] Properly handle unrequested failures

389f71a

KonradStaniec force-pushed the etcm-275/async-processing branch from 9f3e3ea to 389f71a Compare November 3, 2020 06:52

KonradStaniec added 2 commits November 3, 2020 07:53

Merge remote-tracking branch 'origin/develop' into etcm-275/async-pro…

6b0a752

…cessing

Merge remote-tracking branch 'origin/develop' into etcm-275/async-pro…

c77f544

…cessing

mmrozek reviewed Nov 3, 2020

View reviewed changes

kapke reviewed Nov 3, 2020

View reviewed changes

aakoshh mentioned this pull request Nov 3, 2020

ETCM-167: Scalanet Discovery part 3 #766

Merged

Merge remote-tracking branch 'origin/develop' into etcm-275/async-pro…

b4e5ab7

…cessing

kapke reviewed Nov 4, 2020

View reviewed changes

KonradStaniec force-pushed the etcm-275/async-processing branch from e059252 to 535b149 Compare November 4, 2020 14:34

[ETCM-275] Pr comments

0f644d5

Move DownloaderState to separate file Extract Actor state to separate class Call Future.apply at call site Improve synccontrollerspec autopilot

KonradStaniec force-pushed the etcm-275/async-processing branch from 535b149 to 0f644d5 Compare November 4, 2020 18:06

Merge remote-tracking branch 'origin/develop' into etcm-275/async-pro…

4823d44

…cessing

kapke approved these changes Nov 5, 2020

View reviewed changes

mmrozek approved these changes Nov 5, 2020

View reviewed changes

[ETCM-275] Move import to the top of the file

5db9fb9

KonradStaniec merged commit 5db9fb9 into develop Nov 5, 2020

KonradStaniec deleted the etcm-275/async-processing branch November 5, 2020 13:09


		case class UsefulData(responses: List[SyncResponse]) extends ResponseProcessingResult

		final case class DownloaderState(

Conversation

KonradStaniec commented Oct 26, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Testing

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kapke left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kapke left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mmrozek left a comment

Choose a reason for hiding this comment

Uh oh!

KonradStaniec commented Nov 5, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

KonradStaniec commented Oct 26, 2020 •

edited

Loading