Skip to content

Support configurable consistency levels #5

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
otoolep opened this issue Sep 2, 2014 · 15 comments
Closed

Support configurable consistency levels #5

otoolep opened this issue Sep 2, 2014 · 15 comments

Comments

@otoolep
Copy link
Member

otoolep commented Sep 2, 2014

rqlite should support the following levels of consistency for reads:

  • Any node can service a read as long as it believes it remains in the cluster.
  • Only the leader should serve reads.
  • The leader should check before serving any read if it is actually still the leader. It can happen, that a leader has not detected a network partition since heatbeat failures are not detected instantaneously.

These levels of consistency should chosen by a switch at the command-line.

@otoolep
Copy link
Member Author

otoolep commented Feb 25, 2016

It would probably be better if the consistency level was actually a URL parameter.

@codahale
Copy link

If you haven't already seen it, Kyle Kingsbury's article on linearizability and stale reads in Raft systems is a good read on the consequences of relaxing read consistency in Raft-based systems. A single node's local opinion of whether or not it's the leader can and will be faulty during a partition, and check-then-act strategies will suffer from race conditions during partitions. The only way to ensure linearizability is to send queries through the Raft consensus process, where they will be totally ordered along with reads. Both Consul and etcd had issues with stale reads, and now both support strongly consistent queries via URL parameters.

@otoolep
Copy link
Member Author

otoolep commented Feb 27, 2016

Interesting @codahale -- I did read that article many months ago, but didn't realise the implication that queries through the log might be required to be 100% sure.

That said I did just add support for a verify URL param, which forces the system to call:

https://godoc.org/github.com/hashicorp/raft#Raft.VerifyLeader

Now that I think about it, there is probably still room for a race here. Very small, but still possible.

@aphyr
Copy link

aphyr commented Feb 27, 2016

Yup. VerifyLeader is not sufficient. You need to wait for a noop op to be committed by the raft state machine, or block until some other operation commits.

@aphyr
Copy link

aphyr commented Feb 27, 2016

(and since we found this issue experimentally in both etc and consul, I'm pretty confident you'll see it in rqlite as well)

@otoolep
Copy link
Member Author

otoolep commented Feb 27, 2016

I'm sure I will. Thanks @aphyr

I'll fix up verify to do it right so.

@zmedico
Copy link
Contributor

zmedico commented Feb 27, 2016

A single node's local opinion of whether or not it's the leader can and will be faulty during a partition, and check-then-act strategies will suffer from race conditions during partitions.

Sure, but isn't any check-then-act strategy doomed to race conditions anyway, if the whole check-then-act operation is not atomic relative to the raft state machine?

For purposes of discussion, it's useful to have a practical example of how to accomplish a given task while avoiding a race. A transaction is a practical means to achieve an atomic operation relative to the raft state machine. For example, it's possible to use a transaction to implement a compare-and-swap operation with rqlite. Simply begin the transaction with an operation that is guaranteed to fail in the event of a race, like inserting a row into a table, such that the insert is guaranteed to fail if a competing transaction is processed first. The transaction will roll back if the if this first insert fails, guaranteeing that we'll have either a successful atomic operation, or a rollback.

VerifyLeader is not sufficient. You need to wait for a noop op to be committed by the raft state machine, or block until some other operation commits.

For a read operation, the result is potentially stale as soon as the data is received by the client. So, I don't see any practically utility in having readers block on the raft state machine, unless they get to hold a lock on the state machine until they close their connection. Obviously, write transactions like in the example I've given must block on the raft state machine.

@otoolep
Copy link
Member Author

otoolep commented Feb 27, 2016

For a read operation, the result is potentially stale as soon as the data is received by the client.

Agreed. Once it hits the client, it could always be out-of-date regardless.

What I thought my verify change had done was ensure 100% that the query was performed when the node was the leader. Of course, a little though showed this not to be the case. While I will add an option so that the client can be sure the node was the leader when the query executed (the query will be a no-op through the log) the data could be still out-of-date by the time client gets it. But this level of consistency may be useful to some people.

@aphyr
Copy link

aphyr commented Feb 27, 2016

@zmedico
Copy link
Contributor

zmedico commented Feb 27, 2016

But this level of consistency may be useful to some people.

Maybe so, but there's an extra level of consistency available if the reader gets to hold a lock on the raft state until it closes its connection. Does that sound interesting @aphyr?

@zmedico
Copy link
Contributor

zmedico commented Feb 28, 2016

You don't have to hold a lock to provide linearizability.

That's true. I was just thinking of practical uses that go beyond linearizability. I'm not so sure that read locks are really desirable anyway.

otoolep added a commit that referenced this issue Apr 5, 2016
@otoolep
Copy link
Member Author

otoolep commented Apr 5, 2016

rqlite now supports 3 different levels of read consistency -- none, soft, and hard. The first just goes to the local SQLite file, soft does a local leader check before reading the local SQLite file, and hard sends the query request through the raft consensus mechanism. I think this addresses this issue, let me know if I am mistaken.

@otoolep otoolep closed this as completed Apr 5, 2016
@otoolep
Copy link
Member Author

otoolep commented Apr 5, 2016

soft is the default.

@otoolep
Copy link
Member Author

otoolep commented Apr 5, 2016

Actually, moved to "weak" and "strong".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants