Skip to content

Reduce number of database transactions to increase throughput #5186

@joostjager

Description

@joostjager

One result of recent benchmarking is that it could very well be that the use of fsync to flush database writes to disk is the number one factor that influences node performance. It seems that the actual speed of the disk (mb/s) is hardly relevant because the write time is dwarfed by the sync latency.

On a google cloud persistent disk (ssd), the syncs/sec score on a file is about 400. That really puts a cap to the maximum transaction rate of a node.

Measuring syncs/sec can be done via the fio tool (looks at IOPS):
fio --rw=write --ioengine=sync --fdatasync=1 --size=200m --bs=4k --name=mytest

A bbolt update transaction requires two sync calls. bbolt uses global locks, which means that each update transaction on a database blocks all other transactions for at least the time that it takes to execute those two sync calls.

In lnd, there are three database that are actively used: channel.db, wallet.db and sphinxreplay.db. These databases can be locked independently, which is better for performance than if it were a single database. Still it would be better to further isolate independent data in separate database files. An example could be to create a database per channel.

Furthermore it could be worth to consolidate multiple transactions into one. Either via batching or by combining existing transactions. And with that reduce the number of those expensive sync calls.

Below is an overview of the transactions that are currently required to complete a payment on two nodes that have a direct channel. It looks like there is a lot of potential to reduce the tx count.

Sender db update transactions:

  1. channeldb.(*PaymentControl).InitPayment (batched)
  2. htlcswitch.(*persistentSequencer).NextID
  3. channeldb.(*PaymentControl).RegisterAttempt (batched)
  4. htlcswitch.(*circuitMap).CommitCircuits (batched)
  5. htlcswitch.(*circuitMap).OpenCircuits
  6. lnwallet.(*LightningChannel).SignNextCommitment
  7. htlcswitch.(*circuitMap).DeleteCircuits (batched)
  8. lnwallet.(*LightningChannel).ReceiveRevocation
  9. htlcswitch/hop.(*OnionProcessor).DecodeHopIterators (batched)
  10. channeldb.(*OpenChannel).SetFwdFilter
  11. lnwallet.(*LightningChannel).RevokeCurrentCommitment
  12. htlcswitch.(*networkResultStore).storeResult (batched)
  13. htlcswitch.(*circuitMap).DeleteCircuits (batched)
  14. routing.(*missionControlStore).AddResult
  15. channeldb.(*PaymentControl).SettleAttempt (batched)
  16. lnd.(*preimageBeacon).AddPreimages (batched)
  17. lnwallet.(*LightningChannel).RevokeCurrentCommitment
  18. lnwallet.(*LightningChannel).SignNextCommitment
  19. htlcswitch.(*circuitMap).DeleteCircuits (batched)
  20. lnwallet.(*LightningChannel).ReceiveRevocation
  21. hop.(*OnionProcessor).DecodeHopIterators (batched)
  22. channeldb.(*OpenChannel).SetFwdFilter
  23. htlcswitch.(*Switch).ackSettleFail (batched)

Receiver db update transactions:

  1. lnwallet.(*LightningChannel).RevokeCurrentCommitment
  2. lnwallet.(*LightningChannel).SignNextCommitment / btcwallet.(*BtcWallet).SignOutputRaw
  3. lnwallet.(*LightningChannel).SignNextCommitment / channeldb.(*OpenChannel).AppendRemoteCommitChain
  4. htlcswitch.(*circuitMap).DeleteCircuits (batched)
  5. lnwallet.(*LightningChannel).ReceiveRevocation
  6. lightning-onion.(*Router).generateSharedSecret
  7. htlcswitch/hop.(*OnionProcessor).DecodeHopIterators (batched)
  8. lightning-onion.(*Router).generateSharedSecret
  9. invoices.(*InvoiceRegistry).AddInvoice
  10. invoices.(*InvoiceRegistry).UpdateInvoice
  11. channeldb.(*OpenChannel).SetFwdFilter
  12. lnwallet.(*LightningChannel).SignNextCommitment / btcwallet.(*BtcWallet).SignOutputRaw
  13. lnwallet.(*LightningChannel).SignNextCommitment / channeldb.(*OpenChannel).AppendRemoteCommitChain
  14. htlcswitch.(*circuitMap).DeleteCircuits (batched)
  15. lnwallet.(*LightningChannel).ReceiveRevocation
  16. htlcswitch/hop.(*OnionProcessor).DecodeHopIterators (batched)
  17. channeldb.(*OpenChannel).SetFwdFilter
  18. lnwallet.(*LightningChannel).RevokeCurrentCommitment

To find out what the dynamic behavior of batch transactions does to the fsync rate, the tool bfgtrace can be used. It includes a script syncsnoop.bt that captures the sync calls.

To get a rate, the following command can be used:

syncsnoop.bt | grep lnd | pv -rl -i 10 > /dev/null

If you run this tool with the https://github.com/bottlepay/lightning-benchmark benchmark (config lnd-bbolt-keysend), I am getting the following results on my machine:

Transactions/second: 18
Fsyncs/second: 400

That is 22 fsyncs per payment for sender and receiver together. It is less than the 41 fsyncs above for a single payment, but still feels that there is potential to reduce this.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions