|
Speaking of S3 becoming strongly consistent, Swift was strongly consistent for objects in practice from the very beginning. All the "in practice" assumes a reasonably healthy cluster. It's very easy, really. When your client puts an object into a cluster and receives a 201, it means that a quorum of back-end nodes stored that object. Therefore, for you to get a stale version, you need to find a proxy that is willing to fetch you an old copy. That can only happen if the proxy has no access to any of the back-end nodes with the new object.
Somewhat unfortunately for the lovers of consistency, we made a decision that makes the observation of eventual consistency easier some 10 years ago. We allowed a half of an even replication factor to satisfy quorum. So, if you have a distributed cluster with factor of 4, a client can write an object into 2 nearest nodes, and receive a success. That opens a window for another client to read an old object from the other nodes.
Oh, well. Original Swift defaulted for odd replication factors, such as 3 and 5. They provided a bit of resistance to intercontinental scenarios, at the cost of the client knowing immediately if a partition is occurring. But a number of operators insisted that they preferred the observable eventual consistency. Remember that the replication factor is configurable, so there's no harm, right?
Alas, being flexible that way helps private clusters only. Because Swift generally hides the artitecture of the cluster from clients, they cannot know if they can rely on consistency of a random public cluster.
Either way, S3 does something that Swift cannot do here: the consistency of bucket listings. These things start lagging in Swift at a drop of a hat. If the container servers fail to reply in milliseconds, storage nodes push the job onto updaters and proceed. The delay in listigs is often observable in Swift, which is one reason why people doing POSIX overlays often have their own manifests.
I'm quite impressed by S3 doing this, given their scale especially. Curious, too. I wish I could hack on their system a little bit. But it's all proprietary, alas.
|