-
Notifications
You must be signed in to change notification settings - Fork 38.6k
Add NODE_BLOOM service bit and bump protocol version #2900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
The alert tests are protocol version specific; what would be best way to work around this? I'd say just set PROTOCOL_VERSION for testing, but it's a #define rather than a mutable variable. |
|
What DoS attacks? Why not post them to the security list so we can look at how to fix them (ignoring for now the fact that bitcoin is full of DoS attacks so this would not make much difference). Anyway, I think this pull request is a bad idea. I do not wish to have to support this in bitcoinj as it would complicate things significantly to have to search out nodes that properly support bloom filtering. Basically any feature in Bitcoin can be DoS attacked, that's not a good reason to just switch things off. |
|
For instance this one posted to the email list: http://www.mail-archive.com/[email protected]/msg02490.html All of the devs, including you, are aware of a second and more serious non-public vulnerability. bitcoinj added a protocol version test to find bloom capable peers in revision 2c44a4fad7faccfe9b1392c67f60d21b25703bde; add a second test to check for the NODE_BLOOM bit. In the future DNS seeds can be updated to let you ask for peers with specific service bits set, but there's no rush as bloom filtering is left on by default for now. |
|
I asked Peter to open this pull. Service bitting bloom makes sense to me— it increases nodes control over the granularity of the services they are providing the public, should it be needed— I have no clue why I didn't ask for this to begin with. The primary risk I see with this is that it will potentially result in increased levels lazy alternative node implementers not implementing the bloom server side functionality, but at the same time at least those nodes will be identifiable and avoidable (as opposed to them simply not implementing and making your connecting life hard). So I do not believe it would be useful to protocol-version-suicide-pact the bloom functionality. If we're concerned that ignorant node operators may turn off bloom when they don't need to, we could remove the hidden switch, but I'm not especially concerned about that as bloom saves a lot of bandwidth when not being abused. Separately, this is what litecoin has implemented since they had the benefit of later knowledge when they did their protocol update that added bloom support. So anything that supports litecoin would have the service bit testing in any case. |
|
Keep in mind that if an alternative node implementation doesn't want to put in the relatively small amount of effort to implement bloom filters, what makes you think they are going to put in the much larger amount of effort required to really get Bitcoin semantics exactly right regarding convergence? Given that SPV nodes are relatively dependent on their peers - especially if users are accepting zero-conf transactions in any way - it's reasonable to only want to connect to peers running the satoshi codebase rather than some other implementation. |
|
So the argument is that because you can ask a peer to use less bandwidth, that makes dos attacks worse? You can't fix a denial of service by denying service. That's backwards, especially as I don't recall the last time I saw a DoS attacker who actually paid for his own bandwidth. The only way to fix it is to figure out who should be getting service and ensuring they get it first. So why don't you fix it by allowing nodes to prioritise inbound connections? |
|
@mikehearn As you know the DoS attacks identified are not attacks on bandwidth, but rather other resources. Those attacks are significantly easier to pull off if the attacker can reduce the amount of data the targets send back to them; bloom filters do exactly that by reducing the INV messages the attacker receives, allowing one computer to attack a very large number of nodes with very little bandwidth consumption. It would be helpful if you responded on the email list with what types of prioritization you would consider most useful for SPV clients - @johndillon attempted to start that discussion but I haven't seen you reply. Disabling bloom filters is a form of prioritization in of itself of course, as SPV nodes do not contribute back to the network by relaying data. It is much better for them to have temporary issues connecting in an attack than for relaying in general to fail, something that could lead to a dangerous fork due to convergence failure. Regardless as @gmaxwell pointed out aside from attacks, the protocol should allow for nodes to advertise whether or not they support bloom filters as a matter of good protocol design - it is after all a service. It's telling that when I mentioned that NODE_BLOOM didn't exist, @gmaxwell was surprised. |
|
ACK NODE_BLOOM and ACK the entire patch, though myself, I probably wouldn't have added an option to disable bloom filters in the reference impl A minor BIP would be nice (like to document every protocol change in a BIP, even though this is terribly minor) |
|
@jgarzik Good idea re: BIP. |
|
Note: When you bump the protocol version it breaks the alert tests. Your options are then to either generate new signed test data every time the protocol version is bumped, or rather fix the alert tests so the protocol version does not matter. litecoin-project@dbc5f6d |
|
I'm curious to see what @SergioDemianLerner thinks of this patch. My concern is that if there IS a valid DoS attack on bloom-filter nodes (the "attack", as I understand it, is supposed to cause excessive disk seeking looking for transactions, yes? "meh" -- worst case is "node gets slow" if it isn't running with big disk cache buffers or from an SSD), then adding a NODE_BLOOM bit will just make it easier for attackers to find them and attack them. |
|
We only have 64 service bits. It would make sense to only use them when nodes benefit from not implementing features. On the other hand, having a separate service bit means notes can support bloom filters, but not NODE_NETWORK... |
We've got 64 of them - I don't think we're about to run out... |
|
@gavinandresen "node gets slow" is a potentially serious problem. "making it easier for attackers to find them": Serving SPV nodes is far less important to the health of the network than ensuring relaying works and consensus is maintained. Next time I'd suggest letting people finish exploring the issue before you assume revealing the exploit publicly won't do any harm. |
|
Can we cut it out with "revealing the exploit" stuff? It's trivial to find DoS attacks on bitcoind. I did it for fun last time this came up and it took about 45 minutes. I was going to post about it (it's a memory bloat issue) and then got distracted with the RNG stuff. Why would we make bloom filtering optional? There's no benefit to be had from that at this point. If somebody wants to reimplement the protocol for some reason, they can reimplement that too. No big deal. Clients that rely on it are not dramatically less important than other nodes. They're called "end users" and they're extremely important! You can force nodes to do arbitrary amounts of disk seeking with or without Bloom filtering. Just ask for random blocks in the chain. Asking them to filter the blocks adds some CPU load, but that just multiplies resource exhaustion dimensions from 1 to 2. Recall that most average hard disks can only manage about 100 seeks per second. Connect 100 times, request one random block per second per connection and all the seeks are gone. Bandwidth usage is hardly a problem. Nodes don't remember what data elements they already served so you can just do that forever, but even if they did you could just reconnect every so often to make them forget. I think I've explained how to implement a working anti-DoS strategy several times by now. If I didn't reply to John it's only because this issue keeps coming up and I feel like I'm repeating myself. First step - make nodes understand their own resource limits. This is useful anyway because an operator may not want bitcoind to eat all available machine resources, and currently it will just if it legitimately gets busy. Second step - implement handlers that run when the node is running out of resources, that perform load shedding. For example, throwing out low priority transactions in the mempool, disconnecting peers that score badly (idle for long periods of time, use lots more memory than would be expected, etc). Coming up with good scoring functions is a big part of the art of DoS defence. This is where you'd score transactions before signature checking as well. Third step - once you force attackers to basically act just like regular network nodes, you can introduce an optional cookie mechanism so if there are suddenly thousands of clients that appear to be well behaved, clients that have a long history get priority over clients that appeared 5 minutes ago. This also helps in the case where Bitcoin "suffers" a sudden spike in popularity. Established users take priority over new users. Final step - optimise everything so the amount of load you can handle before falling over gets pushed higher and higher. The wrong strategy is the one being pursued here:
The last time this happened I described the strategy as a "death spiral" and I wasn't joking. |
|
Mike, enough with the overheated rhetoric. Occasionally nodes will have reasons not to offer the bloom filter service, while still having block data. (NODE_NETWORK) Right now the protocol doesn't give any way to say that, adding the NODE_BLOOM service bit lets you do that. Of course, if you are going to have NODE_BLOOM, it's useful to be able to disable the bloom service for testing SPV clients, hence the undocumented command line flag. The code implementing the feature also shows what should happen if a node doesn't support bloom filters: kick peers requiring that feature so they won't waste their bandwidth. It'd also be useful as a temporary emergency measure if a DoS attack is launched, but in the meantime we don't have any reason to expect users to use the flag. It is unfortunate that Gavin's revealed a particularly effective one, but that's life. In any case, finding DoS attacks may be trivial for you, but our attackers don't seem to find them on their own and seem to only launch them after they have been revealed publicly prior to a patch. If you want to talk about priority schemes, move it to the email list and reply to @johndillon's thread rather than cluttering up this pull-req with off-topic discussion. Your ideas only work against some types of attacks in any case. |
|
@wtogami I suggested awhile back to make it easy to set the alert keys locally for testing - maybe that's the way to handle testing them? You'd still have to regenerate the test cases every time the protocol version was changed, but that could be turned into a simple script using a known privkey. Might be a better idea in general: because the key is what's non-standard, you can have test cases that use times that are standard without risking re-use of the test alerts on the network. |
|
I don't have access to the DoS report of issue it's been discussed, but I can imagine. Any tool (or NODE_ bit) that gives the users the ability to mitigate an eventual DoS attack would be welcomed by the community. Bitcoin network protocol is not "fair", peers do not send the exact amount of useful information they receive. This is a true altruistic network, so I think people won't use these bits to discriminate peers. |
|
Regarding using the bit NODE_BLOOM maliciously to detect "vulnerable" nodes, I don't think this ease much an attack, since bloom enabled nodes can been detected by many different ways indirectly. (e.g. connecting to the victim with two peers, and testing if txs are filtered or not). |
|
Another way of looking at it is Gavin is worried about NODE_BLOOM making it easier to detect "vulnerable" nodes, and his solution is to make all nodes vulnerable. |
|
ACK to NODE_BLOOM and this particular patch. We will however need a solution for the broken alert tests, preferably one that does not require the alert key signer to have de facto centralized control over any protocol version bumps. |
|
@wtogami change your test to run with a throwaway key which is only used in testing mode if thats your concern. |
|
@gmaxwell Does that mean you suggest that be added as a second commit to this PR? |
|
@wtogami No, I don't care, updating the protocol version is infrequent enough that we can just fix the test after doing so. "Defacto centeral control" is not a concern there, if you are having trouble getting the test updated you just make the change I suggested, remove the test, or remove the alert key entirely. |
|
@gavinandresen I don't really follow your reasoning; assuming an attack exists (ignoring whether it does or not for now), then forcing everyone to be vulnerable is certainly worse than making it optional and advertizing it (perhaps combining it with stronger anti-DoS measures, knowing that it may reduce availability of other offered services). I do agree with @mikehearn that a decent resource-limitating implementation is necessary as a generic solution against DoS attacks. I do believe that's orthogonal to the approach taken by this change, though. The services offered by P2P nodes to the network (for free!) are quite distinct, and have different use cases, different clients, and different attack models:
In my opinion, these are sufficiently independent from each other that they should be easily isolatable. We're relying on charity of those running full nodes to provide these services, and which of these they consider important for the survival of the network may differ. Additionally, not all of these are necessarily present to the same extent in the network - there may be more demand for some than for others. This all speaks in favor of having separate services bits for them. Maybe at some point more specialized and separately-maintained software for each exists, though that's probably not for soon. On the other hand, requiring full node implementations to support SPV functionality probably benefits the popularity of Bitcoin as a payment system, and may improve its usefulness to the economy. "Surviving" up to the point where we're worried about certain attacks may be more important than dealing with those attacks in the first place. Still, ultimately this is about whether alternate full node implementations are allowed to not implement SPV services, and they may have reasons not to (perhaps because they offer a competing lightweight client model). If the choice is between them not implementing Bloom filtering (and ignore requests, or disconnect in case of such requests), and them being able to advertize not supporting it in the first place, I certainly choose the latter. I lean towards ACK, but this discussion probably belongs elsewhere (it's not entirely specific to the reference client), and certainly warrants a BIP. The implementation looks good in any case. |
|
A good example where the very different requirements of the different types of nodes matters is for anyone with a lot of bandwidth available to them, but not a lot of disk io or memory, a common scenario for servers in datacenters. The most efficient way to push blocks over the wire for such a server would be to use sendfile() to do a direct copy from the blockchain data on disk to the network interface - behind the scenes Linux will use direct zero-copy DMA transfers of the data from the disk interface to the network interface, or system memory to the network interface, using almost no CPU in the process. On the other hand that means the blocks have to be sent unfiltered, which is fine if my peer needs the whole block. This arrangement uses the resources I have available most efficiently for the sake of the network. But if I start accepting SPV clients I can't use sendfile() anymore, and even worse is that because I'm serving those SPV clients filtered blocks I'm using up far more disk IO than network bandwidth, when what I have available to me is the opposite. Interestingly the opposite case is common too: lots of home users running nodes have plenty of disk IO available, especially if they have SSD's, but very poor upload capacity. It would probably make sense for full nodes to prefer to connect to nodes without NODE_BLOOM set, especially if they are syncing a lot of archival history, leaving capacity on the NODE_BLOOM capable nodes for SPV clients that need it. |
|
Needs rebase and a test plan. But I still think this is a bad idea.... |
|
@gavinandresen rebased Current test plan and results has been as follows:
Peers connect normally, no observed changes: PASS
Non-bloom-using nodes connect normally: PASS
@sipa implementation https://github.com/sipa/bitcoin-seeder ignores NODE_BLOOM: PASS Note that I only checked that the source code for both tests for NODE_NETWORK with an and, and thus will ignore other service bits being set. We can modify the dns seeds to filter requested service bits later if required; useful later for other things like the proposed NODE_ARCHIVAL_BLOCKCHAIN_DATA-type stuff. |
|
Note that the alert tests need to be fixed because the protocol version was incremented, as discussed above; if disabled manually all other tests run fine. |
|
@mikehearn Read the patch prior to commenting about it; it automatically kicks older peers so they won't waste their bandwidth and my draft BIP says that behavior is a must. Also, the requirements of serving archival and bloomfilter-using nodes are very different and can productively be optimized differently: #2900 (comment) |
|
Yes, I read that comment. That's why I suggested serving the chain torrent if you have lots of bandwidth and not much CPU. Or heck, just serve snapshots of the chain via HTTP. Then you have a piece of software and protocol actually designed for serving large files, instead of bitcoin+p2p protocol which simply isn't. |
|
So you agree that it's backwards compatible with older SPV implementations? Any other issues? I don't see why we want to depend on a clunky system of requiring separate manual torrent/HTTP downloads, especially given ideas like partial UTXO mode to transparently bring nodes from SPV security to full over time in the background. |
|
Your patch is "backwards compatible" as long as most nodes don't use it, which is not the meaning most people would associate with the term. Worse, you implemented this by DoS banning the IP address, which fails to take into account that many users (especially on mobile) sit behind giant NAT boxes. A single user who didn't upgrade could get their entire local cell tower or even city banned from every node with your patch activated, even for clients that are upgraded. You could make it backwards compatible by not having such nodes take part in the regular P2P network at all and provide a separate DNS seed for them, so old clients would never see them at all, but then if you're going to have a separate P2P network why not use a P2P system that's actually designed for file distribution, like BitTorrent? Ignoring for now that Bitcoin-Qt doesn't have any SPV support and nobody is working on any, fancy automatic migration of SPV nodes to full nodes doesn't make sense from a user experience perspective - the requirements are just so different, such arguments don't persuade me. As it is, the maintainer of the serving side says this patch is a bad idea, the maintainer of the client side says this is a bad idea, so what do you hope to achieve by keeping this issue open? It gets quite frustrating when the people who are actually maintaining the code bases that are affected by this patch both say "this is a bad idea" and you just plough on regardless. |
|
Ah good, finally you looked at the code. Anyone have any comments re: the DoS behavior? I could potentially change it to simply close the associated connection. Service bits are meant to advertise services, and performing computations on behalf of clients who don't serve other clients sounds very much like a service to me. That said I'd be very surprised to see many nodes operators disabling bloom filters as the command-line switch to do so is undocumented, so we've got plenty of time to upgrade DNS seeds to let nodes pick service bits they want, or just require NODE_BLOOM to be set if protocol version >=70002. (DNS seeds are just for bootstrapping after all and nodes should use them infrequently; no sense putting too much effort into them vs. decentralized peer discovery) I'm sure more people would run Bitcoin-QT full-nodes if it was useful immediately, albeit with reduced SPV security. We could certainly use more full nodes. You like to say how businesses will "obviously" run full nodes - maybe we can make it easier for them by getting their nodes up and running as fast as possible. An attractive alternative is to just use a central service immediately rather than waiting after all. Note the positive comments and ACKS from gmaxwell, sipa, jgarzik, and wtogami - the reference client is maintained by a group of people, not just one guy who calls the shots. |
|
I think it may make more sense to start advertising NODE_BLOOM before we add a (even undocumented) option to disable it. That could ease the transition significantly |
|
@TheBlueMatt If the sticking point for people is the fact that there is that option, I'll remove it. |
NODE_BLOOM enabled-by-default with 0.9 combined with always-enabled with 0.8.x means most nodes will use it.
Please. No. Removing the option would mean other clients can entirely opt out of paying attention to the service bit, thereby rendering it pointless. The option being widely deployed at least makes it very easy to tell the world how to defend against a particular type of problem if it were to happen, which may be enough to discourage that problem from happening as they would know the network can bounce back very quickly without any software update. @petertodd In retrospect, the 24 hour ban was too heavy handed. Mike is right about it being too easy to ban an entire IP. |
|
The "sticking point" for me is practical: every service bit adds complication-- another possible configuration that should be tested, but probably won't be (which makes attacks more likely). I have seen zero evidence that requiring that every NODE_NETWORK node support bloom filters causes anything more than temporary denial-of-service problems for under-powered full nodes, and I see large benefits to requiring that all full nodes support SPV clients. |
|
On Mon, 2013-10-21 at 11:39 -0700, Warren Togami wrote:
|
|
Updated to use CloseSocketDisconnect() rather than a 24-hour DoS-ban. Tested with my Android wallet, which happily successfully connects then is disconnected about five times a second forever... Off-topic here, but @schildbach I'd suggest some kind of back-off algorithm, or at least waiting a second or two. It even does that on a cellular data connection, which could get rather expensive. |
|
@petertodd Yes, that's a well known regression in bitcoinj: http://code.google.com/p/bitcoinj/issues/detail?id=296 |
|
There has been no activity in this issue for more than half a year. What's the way forward here? It is clear that next time that a service is introduced on the P2P network we have to be more careful, and add a services bit (or another way) to advertise the service instead of making it mandatory at a protocol version. This makes it possible for the node owner to choose whether to run the service or not, and to disable it if problems arise. However for Bloom filtering I think this ship has sailed. All SPV clients rely on this functionality to be available with NODE_NETWORK, and it seems unreasonable to break this expectation in a new protocol version. |
|
Both Litecoin and Dogecoin as well as their forks have deployed this patch without problems. From what I've seen an heard, their SPV clients are not acutally looking for the service bit, since this patch is designed to be backwards compatible and there are only few nodes which disable bloom filters. Instead they are disconnected when trying to set the bloom filter and choose another node instead. I think giving node operators the option to influence cpu/bandwith tradeoffs based on their needs is a good idea, and experience from altcoins seems to suggest that SPV clients are not negatively impacted by this change in a notable way. |
|
Arguing for a patch on the grounds that not enough people use it to be harmful is not a great start. But I think we went around this six months ago and the conclusion was not to apply - the pull req should just have been closed back then. The right way to solve the problem (is it a problem yet?) of nodes running out of CPU is to teach them how to prioritise within their capabilities, not expect node operators to learn complicated administration tasks. If bitcoind were routinely saturating my cheap little VPS' CPU this sort of quick hack would be a lot more compelling, but there's no sign of such problems. The top complaint at the moment is bandwidth not CPU. |
|
Right, I run various nodes on cheap ARM boxes and have never seen significant CPU usage after the initial block sync (and from that, ECDSA verification is by far the most expensive operation). Bloom filtering is really efficient. If anyone can show with benchmarks/profiles that serving SPV nodes filtered blocks is actually a problem in practice, I'll gladly reconsider my opinion. Bandwidth is another matter, but disabling bloom certainly won't help with that. |
|
AFAIK, this is not about CPU usage or network bandwidth at all, but about the ability of remote attackers to DoS a node's disk bandwidth. The capability bit provides a way to opt out of one DoS attack vector. Do we need this? My feeling is "NAK" but it is really a protocol change, and thus requires a BIP and some email list discussion (the latter of which has already happened, IIRC). Even if we all agreed this is ACK-worthy, it would still need a BIP, as this is a network-wide behavior change. |
|
Wouldn't it be a good idea to have for full node implementations like btcd which do not support bloom? |
|
This discussion has been going on for a long time. A draft BIP was written, as well as discussion on the mailing list: http://sourceforge.net/p/bitcoin/mailman/message/31298904/ . If this is to be merged it at least needs a rebase. In any case there needs to be a decision, this has dragged on for too long, and no one seems to care much either way. If we don't absolutely need it it's IMO better to not do it. |
|
I'll be submitting the BIP to the bitcoin/bips repo once I get back next week. @wtogami Exactly. Obelisk nodes don't support bloom filters either; they will in the future support the more scalable and private prefix filters. Eventually we'll likely wind up with NODE_PREFIX in addition to NODE_BLOOM. |
|
@laanwj I think @gavinandresen's announcement has left the chain of command a little unclear, but I guess if you want a decision, it's up to you to make one. Asking for one just makes us all restate our positions again, it doesn't lead to a decision :) |
Lets nodes advertise that they offer bloom filter support explicitly. The protocol version bump allows SPV nodes to assume that NODE_BLOOM is set if NODE_NETWORK is set for pre-70003 nodes. Also adds an undocumented option to turn bloom filter support off for testing purposes. Nodes attempting to use bloom filters are immediately dropped so as to not waste their bandwidth.
|
Automatic sanity-testing: PASSED, see http://jenkins.bluematt.me/pull-tester/29834c0a077ddce249a8822fe2549e3dc5eb08c1 for binaries and test log. |
|
Closing, no consensus this is a good idea. |
Lets nodes advertise that they offer bloom filter support explicitly.
The protocol version bump allows SPV nodes to assume that NODE_BLOOM is
set if NODE_NETWORK is set for pre-70002 nodes.
Also adds an undocumented option to turn bloom filter support off and
immediately kick peers that attempt to use bloom filters for testing
purposes. In addition a number of DoS attacks are made significantly
easier by bloom support, so having an option makes it easy to take
immediate steps in the event of an attack.