ZooKeeper replication for users, roles, row policies, quotas and profiles.#27426
Conversation
42931c0 to
58e9635
Compare
|
Integration tests (thread) says there is a potential deadlock. To fix that it seems to be enough just to call |
There was a problem hiding this comment.
This error message doesn't seem to be very helpful. I suppose maybe it's better to add a new parameter throw_exception to tryInsert(), so that if it's true tryInsert() will throw exception instead of returning false. I mean to call tryInsert with throw_exception=false multiple times and last time with throw_exception=true.
There was a problem hiding this comment.
Good point, I couldn't do much more detailed than the original zookeeper error, and the error is supposed to be rare anyways, so I just let the original exception bubble up when it's the last try.
There was a problem hiding this comment.
Probably we should call refreshEntityNoLock() for existing_entity_uuid too?
There was a problem hiding this comment.
It's not necessary because refreshEntityNoLock on the new entity will call setEntityNoLock which will detect the conflicting existing entity and remove it from local storage.
There was a problem hiding this comment.
It doesn't seem the mutex of ReplicatedAccessStorage actually needs to be locked until this point.
There was a problem hiding this comment.
That's true, I moved things around so now we have clear separation between updating zookeeper, and then locking and updating the local state.
There was a problem hiding this comment.
Usually it's better to do initialization in startup() than in a worker thread.
Because if initialization fails maybe it's because of a wrong configuration so we display an error and stop.
There was a problem hiding this comment.
I moved it to the main thread to add the early failure but kept it here too, to handle the "reset and recover after major zookeeper failure".
There was a problem hiding this comment.
Just idea. Maybe we should not be watching and trying to get from zookeeper and parse each entity. Instead we could make entries_by_id containing only entities which are somehow interesting for us. I mean for example we can make readImpl() to search in entries_by_id, then if not found try to get from zookeeper, start watching and finally update entries_by_id. What do you think?
There was a problem hiding this comment.
I think it would make the subscription/notification system a bit too unpredictable if we don't have all entities available.
Operations like findAllImpl or existsImpl could become quite challenging too.
I also like that the current proposition is essentially as fast as MemoryStorage for all read-only operations (and these operations can keep working when ZK is temporarily down).
|
@kmichel-aiven Very nice implementation, thank you! |
58e9635 to
e55c0a4
Compare
This stores Access Entities in ZooKeeper and replicates them across an entire cluster.
This can be enabled by using the following configuration :
<user_directories>
<replicated>
<zookeeper_path>/clickhouse/access/</zookeeper_path>
</replicated>
</user_directories>
e55c0a4 to
e33a2bf
Compare
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Detailed description / Documentation draft:
This allows storing access entities (users, roles, row policies, quotas and settings profiles) in ZooKeeper to enable replication across an entire cluster.
On a cluster already configured with ZooKeeper, you can enable replication using the following configuration :
The SQL for each access entity is stored at
{zookeeper_path}/uuid/{entity_uuid}.In addition, to enforce name uniqueness, each entity UUID is stored at
{zookeeper_path}/{type_code}/{name}, where the type code is one ofU,R,S,P,Q.This is conceptually similar to the experimental Replicated database engine.
Unlike the
ON CLUSTERoption, users able to manage other users through SQL cannot opt out of the replication.