-
Notifications
You must be signed in to change notification settings - Fork 38.7k
New file-partition.md doc describing how to partition files to ensure fast initial blockchain synchronization.. #10922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Describe partitioning of datadir files between the high-frequency/low-capacity "index" files and the low-frequency/high-capacity "blocks" files. These steps are probably obvious to more adept bitcoind admins, but for newbies, like myself I didn't see these steps written anywhere else.
Formatting
Typos.
|
I think you missed something. There are three directories that matter:
The first is high-bandwidth, low IO. The second has hardly any activity at all. The third is where all activity happens, and is critical for performance. You should not separate the blocks from the blocks/index directory, as things may get ugly if they're out of sync. You can however put the chainstate on a faster/smaller device. |
|
Hi @sipa, thank you very much for the prompt follow up. Per your feedback, I have updated the doc to explicitly call out that chainstate folder must also stay on a fast (internal) disk. The "file-partition.md" notes I made should describe:
Again, the "chainstate" folder remains on the internal disk and never needs to move. Again these "file-partition.md" notes are not the direct route I took, but (if correct) might save the next person considerable delay in initial blockchain synchronization. To your point even though these low-capacity/high-IOfrequency LevelDb and high-capacity/low-IOfrequency blockchain files are physically separated on separate disks, they are kept in sync via the soft links created. Maybe ultimately a more natural configuration of these files would be to have the "index" folder up a level to start with. That's what makes moving these folders confusing in my opinion. Thank you again very much for the prompt feedback, very flattering, thanks. |
Per @sipa, add mention of the fact that, like the "index" folder, the "chainstate" LevelDB folder must also remain on a fast (internal) drive if reasonable synchronization time is to be expected.
|
"file-partition.md"has been updated to highlight the need to keep "chainstate" folder on a fast (internal) drive in addition to "index" files as per @sipa. Thanks @fanquake for adding the label. This issue seems also related to "installation" and/or "configuration"; I did not see either of these as bitcoin project label categories, was thinking they might prove useful as well. Thank you both for your guidance. |
doc/file-partition.md
Outdated
|
|
||
| 2) Stop bitcoind, so that we can rearrange some datadir folders: | ||
|
|
||
| kill -QUIT `cat /Volumes/WD-Passport-Mac/bitcoin/data/bitcoind.pid` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of killing the process, you should use bitcoin-cli stop.
|
|
||
| ln -s /Volumes/WD-Passport-Mac/bitcoin/blocks /Users/coinadm/local/bitcoin/data/blocks | ||
|
|
||
| 注意 - Nota - Note - ध्यान दें - ﻢﻠﺣﻮﻇﺓ - метка |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's with the muptiple languages here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, good question, thank you. To me, the "key finding" here (if any, really) is simply that the "index" folder by default is nested within the "blocks" folder, unlike the "chainstate" folder which is a sibling folder of "blocks". This slightly complicates moving the "blocks" folder off of the internal disk to an external disk; my apologies for repeating the obvious. Database administrators out there (me included) might argue that this nested folder configuration is less desirable as it complicates physical separation of high-capacity/low-frequency block files from the lower-capacity/higher-frequency index (and chainstate) LevelDB files.
When I open the hood to my car there are warnings labels on the radiator cap, etc., etc.., and these labels are in multiple languages to point out appropriate cautions to naive vehicle operators who may have never looked under the hood or checked a radiator. I'm taking inspiration from this and wanted to (hopefully) say "Note" in 1/2 dozen or so most common languages by usage. Also, I want to be especially friendly in this day and age; I feel like we could all use it.
In closing, I hope it doesn't sound like I am trying to "make a mountain out of a molehill" here, it's not that at all; I just wanted to share my personal experiences in hopes of further facilitating ease of use of the system. Overall I find the system to be very easy to work with and well thought out.
|
Your document is still suggesting to split the blocks/ directory from the blocks/index/ directory. Please don't do that; it's dangerous (they need to be in sync), and unnecessary (the blocks/index/ directory hardly sees any I/O). You should just suggest to move the chainstate/ to a faster drive compared to blocks/. |
| | ${datadir}/blocks/index | ${datadir} | low | high | | ||
| | ${datadir}/blocks | ${EXTERAL} | high | low | | ||
| | ${datadir}/chainstate | n/a | low | high | | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At issue still is the value in the last column of the 1st row above (the index file folder "i/o Frequency"; is it high or low). My experience suggests that the index folder files are indeed high frequency, which is really the impetus for this doc; if they were not high frequency I would not have looked into this re-configuration detail. I only identified this possible re-configuration pitfall (which I had mistakenly made) via bitcoind file usage reported by the following command:
lsof -p cat ${datadir}/bitcoind.pid | grep ldb$
This motivated me to move the subordinate/child "index" folder back onto the internal drive and set up the soft links described here. I did not yet follow though and verify i/o frequency demands of these index LeveDB files (yet).
I admit the experience related here is qualitative and currently lacks supporting i/o reporting, but is [i]hopefully[/i] nonetheless correct.
I first moved the entire blocks folder (including the index subfolder) from $datadir to the external disk for internal capacity reasons (basically to save space because I'm cheap), and it synched very very slowly. Then I looked at open files using "lsof" as described above, and saw LevelDB index folder files open on the external USB 3.0 drive. Moving them as described here sped things up to the point that performance seemed to match that with the default configuration of everything (including the blocks folder) on the internal drive. Basically I was just doing a simplistic du -k . & ls -lrt blk*.dat every so often and watching how fast the blk*.dat files were growing in both cases. So qualitative, and not quantitative (I apologize). The responsible thing for me to do at this point is to gather quantitative evidence to support this position.
I'll maybe look for a blockchain indexing or synchronization test that I can run twice; once with the index files on the external drive, and again with the index files off-of the external drive, while trying to capture i/o frequency statistics as well as wall clock time? Note that what I described above with the du -k . and long listings while watching the wall clock is really what I did above.
A possible misperception here is that this doc was meant to be some sort of performance advice. It is not, rather it's really just meant to be a note to help other developers/analysts who (like me) work on very low end commodity hardware yet perform initial synchronization quickly, and without the large-capacity internal disk space requirements. That said, performance minded would likely benefit from moving any high i/o frequency files to the fastest storage available, similar to how traditional file-based databases are tuned.
Neither are these notes meant as instructions to backup the blockchain for portability between bitcoin development instances either; as @sipa points out, the "blocks" folder 'needs to be kept in sync with the "index" folder', rendering the blocks folder useless by itself for backup/portability purposes.
Missing from these notes is the (reasonable?) expectation that bitcoind not be started until the external disk is mounted and likewise that the external disk not be dismounted/ejected until bitcoind is shut down. To this point, I have not yet tried to see what happens, if after running in the configuration suggested here, the operator/developer accidentally tries to start bitcoin w/o the the external storage plugged in; I would hope that the index folder files are not corrupted if the blocks folder is not accessible.
Also, I'm still trying to understand the usage patterns of GitHub, like maybe this would have been better if this were reported as an "issue", I was tempted to do that, but don't personally see any issues. This is really more of a "pitfalls to avoid" type of document I was hoping might further adoption.
I do feel like I am onto something (albeit very minor) here. I do very much appreciate all of the feedback and consideration -- thank you very much.
doc/file-partition.md
Outdated
| | Folder Name | Link Name | | ||
| | ------------------------ | ------------------- | | ||
| | ${EXTERNAL}/blocks/index | ${datadir}/../index | | ||
| | ${datadir}/blocks | ${EXTERAL}/blocks | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo: EXTERAL
Except with |
|
@jimhashhq I think this is pretty good, certainly as a start. Can you please address the comments and squash? |
Also: sometimes there's the problem with the leveldb not supporting the filesystem that the |
|
My apologies for not getting back sooner, I was ill but am feeling better. |
doc/file-partition.md
Outdated
|
|
||
| ln -s /Users/coinadm/local/bitcoin/index /Volumes/WD-Passport-Mac/bitcoin/blocks/index | ||
|
|
||
| 6) Replace the original index folder location with a soft link: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be "block folder"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Corrected, thanks!
Correct typo per @arowser, thank you.
Why close? |
After native build from source on Mac OS, my initial attempts to synchronize the blockchain were very very slow. Upon finding Issue Sync Taking Too Long, I found discussion by all and comments by @sipa in particular to be very useful, and reorganized $datadir folders on my local macOS build/install and summarized steps taken in file-partition.md doc. These comments might find their audience more appropriately elsewhere, please feel free to suggest, thank you very much.
-jimhash
Note: This looks to be logged as issue: #10736