Skip to content

Conversation

@achow101
Copy link
Member

@achow101 achow101 commented May 14, 2020

The data storage layer of the wallet is fairly complicated and not well encapsulated or modularized. This makes it difficult for us to introduce new data storage methods. So this PR refactors it such that there is a clear way to introduce other storage options. This is also intended to clean up the logic of the database layer so that it is easier to follow and reason about.

In the database, there are 3 classes: BerkeleyBatch, BerkeleyDatabase, and BerkeleyEnvironment. BerkeleyDatabase is typenamed to WalletDatabase. The goal is to introduce two abstract classes: WalletDatabase and DatabaseBatch. These will be implemented by BerkeleyDatabase and BerkeleyBatch. This abstract classes can be inherited by other storage methods. All of the database specific things should be hidden in those classes.

To achieve this, we move the database handling things in BerkeleyBatch and turn it purely into a data accessor for the BerkeleyDatabase. Particularly, various static functions that operated on a BerkeleyDatabase are changed to be member functions of BerkeleyDatabase. Read, Write, Erase, and Exists are refactored to have separate functions that can be overridden by other classes. Instead of GetCursor returning a BDB specific Dbc object, new CreateCursor and CloseCursor functions are added and the cursor is handled internally within the BerkeleyBatch. We are then able to introduce the DatabaseBatch abstract class.

Functionality is further moved out of BerkeleyBatch into BerkeleyDatabase. Notably the Db handle creation is moved from BerkeleyBatch's constructor and to a new Open function in BerkeleyDatabase. BerkeleyDatabase will also handle closing with a new Close function instead of having this be part of Flush.

Additionally, the existence of BerkeleyEnvironment is hidden inside of BerkeleyDatabase. mapFileUseCount is removed. Instead WalletDatabase will track it's use count with m_refcount. This is incremented by newly introduced AddRef and RemoveRef functions. m_refcount is used to ensure that the database and environment are not closed while the database may still be in use.

Instead of each BerkeleyEnvironment tracking the fileids of the databases in that environment, a global map g_fileids is used. Since we were already going through every open fileid for the uniqeness check, it doesn't matter what BerkeleyEnvironment has a database with a particular fileid. All we care about is that there is a database in any of the environments that has the same fileid of the database that is currently being opened.

Lastly, BerkeleyBatch, BerkeleyDatabase, and BerkeleyEnvironmentare moved into new fileswallet/bdb.{cpp/h}. WalletDatabaseandDatabaseBatchare inwallet/db.{cpp/h}`. This just gives better organization of the code so they aren't all mixed together.


I've started breaking this down into separate PRs. Some of the simpler stuff is moved up to the front so they can be merged first.

  • Mostly simple moveonly things: wallet: move BDB specific classes to bdb.{cpp/h} #19290
    • walletdb: Make SpliWalletFilePath non-static
    • walletdb: Add IsBDBWalletLoaded to look for BDB wallets specifically
    • walletdb: move IsWalletLoaded to walletdb.cpp
    • walletdb: moveonly: Move BerkeleyBatch Cursor and Txn funcs to cpp
    • walletdb: Move BDB specific things into bdb.{cpp/h}

These PRs are independent of each other and can be merged out of order after #19290 is merged

These PRs require the previous listed to be merged first and need to be merged in order

I will continue to break this down as things get merged.

@meshcollider
Copy link
Contributor

Concept ACK

@DrahtBot
Copy link
Contributor

DrahtBot commented May 14, 2020

The following sections might be updated with supplementary metadata relevant to reviewers and maintainers.

Conflicts

Reviewers, this pull request conflicts with the following ones:

If you consider this pull request important, please also help to review the conflicting pull requests. Ideally, start with the one that should be merged first.

@ryanofsky
Copy link
Contributor

Strong concept ACK for the effort and end result, but I'm scared it looks like the intermediate commits here are moving and changing code at the same time instead of just leaving code where it is or moving it in trivial to review --color-moved MOVEONLY commits. This was previously a hurdle for me in #16341 (comment), but maybe the problem is less severe here because it looks like the moves are happening in single commits, not in harder to reconcile add and remove commits.

I would definitely appreciate it (can't speak for other reviewers) if this PR could be updated to use MOVEONLY commits (multiple small ones throughout or a big one at the beginning or end), but I'll give this a try and maybe the review will be easier than I'm expecting 👍

@achow101 achow101 force-pushed the refactor-storage branch 2 times, most recently from cb030f6 to 0f991e7 Compare May 15, 2020 01:37
@achow101
Copy link
Member Author

but I'm scared it looks like the intermediate commits here are moving and changing code at the same time instead of just leaving code where it is or moving it in trivial to review

I only found one commit which was moving and changing behavior at the same time. This was wallet: Move cursor functions to WalletDatabase and handle internally which I have now split into walletdb: Handle cursor internally (changes behavior) and walletdb: Move cursor functions to WalletDatabase (move).

@bvbfan
Copy link
Contributor

bvbfan commented May 15, 2020

The patch changes ref counting of db files. Now Acquire is called only in Rewrite since previous batch do it. Now ref count is always 0 (except you call Rewrite) which makes whole read/write racy.

@achow101
Copy link
Member Author

The patch changes ref counting of db files. Now Acquire is called only in Rewrite since previous batch do it. Now ref count is always 0 (except you call Rewrite) which makes whole read/write racy.

Acquire is called by WalletBatch's constructor so the ref count is still incremented every time the database is being used. And Release is called by WalletBatch's destructor.

@Empact
Copy link
Contributor

Empact commented May 15, 2020

@achow101 re @ryanofsky's comment, I set out to expand and clarify 67e30b2 into several simpler move-only commits.

You can use git diff 67e30b2b42b4d710739e6fe303cbca69be68032c 125796d8e60b8e740857e427cbaad4aa23f74b52 to compare the results, as there are some changes I've left out, and slight difference, e.g. I took the approach of exposing WalletBatch::ReadKeyValue, but they don't affect the later commits.

achow101/bitcoin@refactor-storage...Empact:refactor-storage-moveonly

@achow101
Copy link
Member Author

@Empact Thanks for doing that! Unfortunately I worked out my own way of doing the separation before I saw your comment. My version was pushed to #18918.

@bvbfan
Copy link
Contributor

bvbfan commented May 17, 2020

Acquire is called by WalletBatch's constructor so the ref count is still incremented every time the database is being used. And Release is called by WalletBatch's destructor.

Got'cha, sorry about that, i've not notice it. I still have concern about touching db env in multi threading. BerkeleyDatabase should guard every env usage, for example in contructor with shared env, the easiest test is to call CreateWalletFromFile from distinct threads with same db location, it will crash at
auto inserted = this->env->m_databases.emplace(strFile, std::ref(*this));
before actual assert in next line.

@Sjors
Copy link
Member

Sjors commented May 18, 2020

Concept ACK

achow101 added 2 commits June 18, 2020 19:44
Make Rewrite actually a member of BerkeleyDatabase instead of a static
function in BerkeleyBatch
@Sjors
Copy link
Member

Sjors commented Jun 19, 2020

I get similar errors as Travis on macOS (./configure --enable-debug --with-incompatible-bdb --enable-werror):

./wallet/bdb.h:167:17: error: non-static data member 'm_file_path' of 'BerkeleyDatabase' shadows member inherited from type 'WalletDatabase' [-Werror,-Wshadow-field]
1687    std::string m_file_path;
1688                ^
1689./wallet/db.h:166:17: note: declared here
1690    std::string m_file_path;
1691                ^

And:

./wallet/walletdb.h:208:17: warning: moving a temporary object prevents copy elision [-Wpessimizing-move]
        m_batch(std::move(database.MakeBatch(pszMode, _fFlushOnClose))),
                ^

Instead of having WalletBatch construct the BerkeleyBatch, have
BerkeleyDatabase do it and return a std::unique_ptr<BerkeleyBatch>
@achow101
Copy link
Member Author

I get similar errors as Travis on macOS

Should be fixed now.

achow101 added 8 commits June 19, 2020 20:40
Instead of having Flush optionally shutdown the database and
environment, add a Close() function that does that.
Refactor mapFileUseCount increment and decrement to separate functions
AddRef and RemoveRef
Adds an Open function for the class abstraction that does nothing for
now.
Make WalletDatabase actually an abstract class and not just a typedef
for BerkeleyDatabase. Have BerkeleyDatabase inherit this class.
Instead of having BerkeleyEnvironment track the file use count, make
BerkeleyDatabase do it itself.
Instead of opening the Db handle in BerkeleyBatch, make BerkeleyDatabase
do that.
File ids were basically global anyways, it doesn't make sense to have
them be environment specific if we're going to loop through all the
environments anyways.
@achow101
Copy link
Member Author

achow101 commented Jun 20, 2020

This PR is now superseded by the PRs listed in the OP. Even though the final one, #19335, is the same diff and commits as this one, I wanted to have it be separate without the history from this PR as it is conceptually unrelated to this refactor. The refactoring into the abstract classes occurs in #19334.

All of the PRs that were dependent on this one will be rebased onto #19334

@achow101 achow101 closed this Jun 20, 2020
laanwj added a commit that referenced this pull request Jul 1, 2020
ca24edf walletdb: Handle cursor internally (Andrew Chow)

Pull request description:

  Instead of returning a Dbc (BDB cursor object) and having the caller deal with the cursor, make BerkeleyBatch handle the cursor internally.

  Split from #18971

ACKs for top commit:
  ryanofsky:
    Code review ACK ca24edf. Changes since last review: StartCursor rename, moving CloseCursor calls near returns
  promag:
    Code review ACK ca24edf.

Tree-SHA512: f029b498c7f275aedca53ce7ade7cb99c82975fd6cad17346a4990fb3bcc54e2a5309b32053bd13def9ee464d331b036ac79abb8fc4fa561170c6cfc85283447
maflcko pushed a commit that referenced this pull request Jul 5, 2020
…Database

d8e9ca6 walletdb: Move Rewrite into BerkeleyDatabase (Andrew Chow)
91d1091 walletdb: Move PeriodicFlush into WalletDatabase (Andrew Chow)
8f1bcf8 walletdb: Combine VerifyDatabaseFile and VerifyEnvironment (Andrew Chow)

Pull request description:

  The `BerkeleyBatch` class has 4 static functions that operate on `BerkeleyDatabase` or `BerkeleyEnvironment`. It doesn't make sense for these to be standalone nor for them to be static functions. So instead, move them from `BerkeleyBatch` into `BerkeleyDatabase` and make them member functions instead of static.

  `BerkeleyBatch::VerifyEnvironment` and `BerkeleyBatch::VerifyDatabaseFile` are combined into a single `BerkeleyDatabase::Verify` function that operates on that `BerkeleyDatabase` object.

  `BerkeleyBatch::Rewrite` and `BerkeleyBatch::PeriodicFlush` both took a `BerkeleyDatabase` as an argument and did stuff on it. So we just make it a member function so it doesn't need to take a database as an argument.

  Part of #18971

ACKs for top commit:
  MarcoFalke:
    re-ACK d8e9ca6 only change is test fixup 🤞
  promag:
    Code review ACK d8e9ca6, good stuff.

Tree-SHA512: 9847e55b13d98bf4e5636cc14bc3f5351d56737f7e320fafffaed128606240765599e5400382c5aecac06690f7e36265ca3e1031f3f6d8a9688f6d5cb1bacd2a
maflcko pushed a commit that referenced this pull request Jul 14, 2020
…Batch abstract class

b82f0ca walletdb: Add MakeBatch function to BerkeleyDatabase and use it (Andrew Chow)
eac9200 walletdb: Refactor DatabaseBatch abstract class from BerkeleyBatch (Andrew Chow)

Pull request description:

  In order to support alternative database systems, we need to have a generic `Batch` class. This PR adds a `DatabaseBatch` abstract class which is implemented by `BerkeleyBatch`. `DatabaseBatch` is now the class that is used by `WalletBatch` to interact with the database. To be able to get the correct type of `DatabaseBatch`, `BerkeleyDatabase` now has a `MakeBatch` function which returns a newly constructed `std::unique_ptr<DatabaseBatch>`. For `BerkeleyDatabase`, that will be `std::unique_ptr<BerkeleyBatch>`.

  The `Read`, `Write`, `Erase`, and `Exists` template functions are moved from `BerkeleyBatch`.

  Part of #18971

  Requires #19308 and #19324

ACKs for top commit:
  Sjors:
    re-utACK b82f0ca
  MarcoFalke:
    ACK b82f0ca 🌘
  meshcollider:
    LGTM, utACK b82f0ca

Tree-SHA512: 6d2d41631c0983391dbecd702e881c6775b155c90b275df97f7157e42608ed251744f9d7ce5173d02a6c5cc38d90b611880fac7fa635d3d8c4d590681f56ac6a
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jul 14, 2020
…atabaseBatch abstract class

b82f0ca walletdb: Add MakeBatch function to BerkeleyDatabase and use it (Andrew Chow)
eac9200 walletdb: Refactor DatabaseBatch abstract class from BerkeleyBatch (Andrew Chow)

Pull request description:

  In order to support alternative database systems, we need to have a generic `Batch` class. This PR adds a `DatabaseBatch` abstract class which is implemented by `BerkeleyBatch`. `DatabaseBatch` is now the class that is used by `WalletBatch` to interact with the database. To be able to get the correct type of `DatabaseBatch`, `BerkeleyDatabase` now has a `MakeBatch` function which returns a newly constructed `std::unique_ptr<DatabaseBatch>`. For `BerkeleyDatabase`, that will be `std::unique_ptr<BerkeleyBatch>`.

  The `Read`, `Write`, `Erase`, and `Exists` template functions are moved from `BerkeleyBatch`.

  Part of bitcoin#18971

  Requires bitcoin#19308 and bitcoin#19324

ACKs for top commit:
  Sjors:
    re-utACK b82f0ca
  MarcoFalke:
    ACK b82f0ca 🌘
  meshcollider:
    LGTM, utACK b82f0ca

Tree-SHA512: 6d2d41631c0983391dbecd702e881c6775b155c90b275df97f7157e42608ed251744f9d7ce5173d02a6c5cc38d90b611880fac7fa635d3d8c4d590681f56ac6a
meshcollider added a commit that referenced this pull request Jul 23, 2020
d416ae5 walletdb: Introduce WalletDatabase abstract class (Andrew Chow)
2179dbc walletdb: Add BerkeleyDatabase::Open dummy function (Andrew Chow)
71d28e7 walletdb: Introduce AddRef and RemoveRef functions (Andrew Chow)
27b2766 walletdb: Move BerkeleyDatabase::Flush(true) to Close() (Andrew Chow)

Pull request description:

  A `WalletDatabase` abstract class is created from `BerkeleyDatabase` and is implemented by `BerkeleyDatabase`. First, to get to the point that this is possible, 4 functions need to be added to `BerkeleyDatabase`: `AddRef`, `RemoveRef`, `Open`, and `Close`.

  First the increment and decrement of `mapFileUseCount` is refactored into separate functions `AddRef` and `RemoveRef`.

  `Open` is introduced as a dummy function. This will raise an exception so that it always fails.

  `Close` is refactored from `Flush`. The `shutdown` argument in `Flush` is removed and instead `Flush(true)` is now the `Close` function.

  Split from #18971

  Requires #19325

ACKs for top commit:
  ryanofsky:
    Code review ACK d416ae5. Only changes since last review were rebasing after base PR #19334 merge, and adding cs_db lock in BerkeleyDatabase destructor, which should avoid races accessing env->m_databases and env->m_fileids
  fjahr:
    Code review ACK d416ae5
  meshcollider:
    Code review & test run ACK d416ae5

Tree-SHA512: 98d05ec093d7446c4488e2b0914584222a331e9a2f4d5be6af98e3f6d78fdd8e75526c12f91a8a52d4820c25bce02aa02aabe92d38bee7eb2fce07d0691b7b0d
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jul 24, 2020
d416ae5 walletdb: Introduce WalletDatabase abstract class (Andrew Chow)
2179dbc walletdb: Add BerkeleyDatabase::Open dummy function (Andrew Chow)
71d28e7 walletdb: Introduce AddRef and RemoveRef functions (Andrew Chow)
27b2766 walletdb: Move BerkeleyDatabase::Flush(true) to Close() (Andrew Chow)

Pull request description:

  A `WalletDatabase` abstract class is created from `BerkeleyDatabase` and is implemented by `BerkeleyDatabase`. First, to get to the point that this is possible, 4 functions need to be added to `BerkeleyDatabase`: `AddRef`, `RemoveRef`, `Open`, and `Close`.

  First the increment and decrement of `mapFileUseCount` is refactored into separate functions `AddRef` and `RemoveRef`.

  `Open` is introduced as a dummy function. This will raise an exception so that it always fails.

  `Close` is refactored from `Flush`. The `shutdown` argument in `Flush` is removed and instead `Flush(true)` is now the `Close` function.

  Split from bitcoin#18971

  Requires bitcoin#19325

ACKs for top commit:
  ryanofsky:
    Code review ACK d416ae5. Only changes since last review were rebasing after base PR bitcoin#19334 merge, and adding cs_db lock in BerkeleyDatabase destructor, which should avoid races accessing env->m_databases and env->m_fileids
  fjahr:
    Code review ACK d416ae5
  meshcollider:
    Code review & test run ACK d416ae5

Tree-SHA512: 98d05ec093d7446c4488e2b0914584222a331e9a2f4d5be6af98e3f6d78fdd8e75526c12f91a8a52d4820c25bce02aa02aabe92d38bee7eb2fce07d0691b7b0d
laanwj added a commit that referenced this pull request Jul 29, 2020
…leyBatch

74507ce walletdb: Remove BerkeleyBatch friend class from BerkeleyDatabase (Andrew Chow)
00f0041 No need to check for duplicate fileids in all dbenvs (Andrew Chow)
d86efab walletdb: Move Db->open to BerkeleyDatabase::Open (Andrew Chow)
4fe4b3b walletdb: track database file use as m_refcount within BerkeleyDatabase (Andrew Chow)
65fb880 Combine BerkeleyEnvironment::Verify into BerkeleyDatabase::Verify (Andrew Chow)

Pull request description:

  `BerkeleyBatch` and `BerkeleyDatabase` are kind of messy. The goal of this is to clean up them up so that they are logically separated.

  `BerkeleyBatch` currently handles the creation of the `BerkeleyDatabase`'s `Db` handle. This is instead moved into `BerkeleyDatabase` and is called by `BerkeleyBatch`.

  Instead of having `BerkeleyEnvironment` track each database's usage, have `BerkeleyDatabase` track this usage itself with the `m_refcount` variable that is present in `WalletDatabase`.

  Lastly, instead of having each `BerkeleyEnvironment` store the fileids of the databases open in it, have a global `g_fileids` to track those fileids. We were already checking fileid uniqueness globally (by checking the fileids in every environment when opening a database) so it's cleaner to do this with a global variable.

  All of these changes allow us to make `BerkeleyBatch` and `BerkeleyDatabase` no longer be friend classes.

  The diff of this PR is currently the same as in ##18971

  Requires #19334

ACKs for top commit:
  laanwj:
    Code review ACK 74507ce
  ryanofsky:
    Code review ACK 74507ce. No changes since last review other than rebase

Tree-SHA512: 845d84ee1a470e2bf5d2e2e3d7738183d8ce43ddd06a0bbd57edecf5779b2f55d70728b1b57f5daab0f078650a8d60c3e19dc30b75b36e7aa952ce268399d5f6
sidhujag pushed a commit to syscoin/syscoin that referenced this pull request Jul 31, 2020
…d BerkeleyBatch

74507ce walletdb: Remove BerkeleyBatch friend class from BerkeleyDatabase (Andrew Chow)
00f0041 No need to check for duplicate fileids in all dbenvs (Andrew Chow)
d86efab walletdb: Move Db->open to BerkeleyDatabase::Open (Andrew Chow)
4fe4b3b walletdb: track database file use as m_refcount within BerkeleyDatabase (Andrew Chow)
65fb880 Combine BerkeleyEnvironment::Verify into BerkeleyDatabase::Verify (Andrew Chow)

Pull request description:

  `BerkeleyBatch` and `BerkeleyDatabase` are kind of messy. The goal of this is to clean up them up so that they are logically separated.

  `BerkeleyBatch` currently handles the creation of the `BerkeleyDatabase`'s `Db` handle. This is instead moved into `BerkeleyDatabase` and is called by `BerkeleyBatch`.

  Instead of having `BerkeleyEnvironment` track each database's usage, have `BerkeleyDatabase` track this usage itself with the `m_refcount` variable that is present in `WalletDatabase`.

  Lastly, instead of having each `BerkeleyEnvironment` store the fileids of the databases open in it, have a global `g_fileids` to track those fileids. We were already checking fileid uniqueness globally (by checking the fileids in every environment when opening a database) so it's cleaner to do this with a global variable.

  All of these changes allow us to make `BerkeleyBatch` and `BerkeleyDatabase` no longer be friend classes.

  The diff of this PR is currently the same as in #bitcoin#18971

  Requires bitcoin#19334

ACKs for top commit:
  laanwj:
    Code review ACK 74507ce
  ryanofsky:
    Code review ACK 74507ce. No changes since last review other than rebase

Tree-SHA512: 845d84ee1a470e2bf5d2e2e3d7738183d8ce43ddd06a0bbd57edecf5779b2f55d70728b1b57f5daab0f078650a8d60c3e19dc30b75b36e7aa952ce268399d5f6
@bitcoin bitcoin locked as resolved and limited conversation to collaborators Aug 30, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants