ZooKeeper replication for users, roles, row policies, quotas and profiles.#27426

Merged

vitlibar merged 2 commits intoClickHouse:masterfrom

aiven:kmichel-replicated-access-storage

Aug 18, 2021

Contributor

kmichel-aiven commented Aug 8, 2021 •

edited

Loading

Changelog category (leave one):

New Feature

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Add replicated storage of user, roles, row policies, quotas and settings profiles through ZooKeeper (experimental).

Detailed description / Documentation draft:
This allows storing access entities (users, roles, row policies, quotas and settings profiles) in ZooKeeper to enable replication across an entire cluster.

On a cluster already configured with ZooKeeper, you can enable replication using the following configuration :

    <user_directories>
        <replicated>
            <zookeeper_path>/clickhouse/access/</zookeeper_path>
        </replicated>
    </user_directories>

The SQL for each access entity is stored at {zookeeper_path}/uuid/{entity_uuid}.

In addition, to enforce name uniqueness, each entity UUID is stored at {zookeeper_path}/{type_code}/{name}, where the type code is one of U,R,S,P,Q.

This is conceptually similar to the experimental Replicated database engine.

Unlike the ON CLUSTER option, users able to manage other users through SQL cannot opt out of the replication.

Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.


Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

robot-clickhouse added doc-alert pr-feature labels

alexey-milovidov added the can be tested label

alexey-milovidov requested a review from vitlibar

August 8, 2021 15:12

kmichel-aiven force-pushed the kmichel-replicated-access-storage branch 3 times, most recently from 42931c0 to 58e9635 Compare

August 8, 2021 21:28

vitlibar self-assigned this

Member

vitlibar commented Aug 13, 2021 •

edited

Loading

Integration tests (thread) says there is a potential deadlock.

To fix that it seems to be enough just to call get_zookeeper() before you lock the mutex of ReplicatedAccessStorage.

vitlibar reviewed

View reviewed changes

src/Access/AccessEntityIO.h Outdated Show resolved Hide resolved

src/Access/ReplicatedAccessStorage.cpp Outdated Show resolved Hide resolved

src/Access/ReplicatedAccessStorage.cpp Outdated Show resolved Hide resolved

src/Access/ReplicatedAccessStorage.cpp Outdated

Member

vitlibar Aug 13, 2021

This error message doesn't seem to be very helpful. I suppose maybe it's better to add a new parameter throw_exception to tryInsert(), so that if it's true tryInsert() will throw exception instead of returning false. I mean to call tryInsert with throw_exception=false multiple times and last time with throw_exception=true.

Contributor Author

kmichel-aiven Aug 17, 2021

Good point, I couldn't do much more detailed than the original zookeeper error, and the error is supposed to be rare anyways, so I just let the original exception bubble up when it's the last try.

src/Access/ReplicatedAccessStorage.cpp Outdated

Member

vitlibar Aug 13, 2021 •

edited

Loading

Probably we should call refreshEntityNoLock() for existing_entity_uuid too?

Contributor Author

kmichel-aiven Aug 17, 2021 •

edited

Loading

It's not necessary because refreshEntityNoLock on the new entity will call setEntityNoLock which will detect the conflicting existing entity and remove it from local storage.

vitlibar reviewed

View reviewed changes

src/Access/ReplicatedAccessStorage.cpp Outdated

Member

vitlibar Aug 13, 2021 •

edited

Loading

It doesn't seem the mutex of ReplicatedAccessStorage actually needs to be locked until this point.

Contributor Author

kmichel-aiven Aug 17, 2021

That's true, I moved things around so now we have clear separation between updating zookeeper, and then locking and updating the local state.

vitlibar reviewed

View reviewed changes

src/Access/ReplicatedAccessStorage.cpp Outdated Show resolved Hide resolved

vitlibar reviewed

View reviewed changes

src/Access/ReplicatedAccessStorage.cpp Outdated Show resolved Hide resolved

vitlibar reviewed

View reviewed changes

src/Access/ReplicatedAccessStorage.cpp Outdated Show resolved Hide resolved

src/Access/ReplicatedAccessStorage.cpp Outdated

Member

vitlibar Aug 13, 2021 •

edited

Loading

Usually it's better to do initialization in startup() than in a worker thread.
Because if initialization fails maybe it's because of a wrong configuration so we display an error and stop.

Contributor Author

kmichel-aiven Aug 17, 2021

I moved it to the main thread to add the early failure but kept it here too, to handle the "reset and recover after major zookeeper failure".

vitlibar reviewed

View reviewed changes

src/Access/ReplicatedAccessStorage.cpp Outdated

Member

vitlibar Aug 13, 2021 •

edited

Loading

Just idea. Maybe we should not be watching and trying to get from zookeeper and parse each entity. Instead we could make entries_by_id containing only entities which are somehow interesting for us. I mean for example we can make readImpl() to search in entries_by_id, then if not found try to get from zookeeper, start watching and finally update entries_by_id. What do you think?

Contributor Author

kmichel-aiven Aug 17, 2021 •

edited

Loading

I think it would make the subscription/notification system a bit too unpredictable if we don't have all entities available.
Operations like findAllImpl or existsImpl could become quite challenging too.
I also like that the current proposition is essentially as fast as MemoryStorage for all read-only operations (and these operations can keep working when ZK is temporarily down).

Member

vitlibar commented Aug 13, 2021

@kmichel-aiven Very nice implementation, thank you!


          Extract AccessEntity parsing from DiskAccessStorage

d346d19

kmichel-aiven force-pushed the kmichel-replicated-access-storage branch from 58e9635 to e55c0a4 Compare

August 17, 2021 14:55


          Add ReplicatedAccessStorage

e33a2bf

This stores Access Entities in ZooKeeper and replicates them across an entire cluster.

This can be enabled by using the following configuration :

    <user_directories>
        <replicated>
            <zookeeper_path>/clickhouse/access/</zookeeper_path>
        </replicated>
    </user_directories>

kmichel-aiven force-pushed the kmichel-replicated-access-storage branch from e55c0a4 to e33a2bf Compare

August 17, 2021 22:00

vitlibar merged commit fd33f2a into ClickHouse:master

ka1bi4 mentioned this pull request

DOCSUP-13431: Edit and translate to Russian (ZooKeeper replication and snowflake) #27988

Merged

azat mentioned this pull request

tests: fix test_replicated_users flakiness #44978

Merged

UnamedRus mentioned this pull request

Shared metadata storage #48620

Closed

UnamedRus mentioned this pull request

Create User with Clickhouse Mode Replicate but not SYNC #61350

Closed

rschu1ze mentioned this pull request

Is feature to manage users by zookeeper experimental yet? #88327

Closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels