Database-level access control#2309
Conversation
|
I would like to first discuss this new feature in the weekly meeting. Thanks |
|
Just discussed in the weekly meeting. It seems like we are all aligned to add more database features. We don't want folks to use database instead of true multi-tenancy, like running multiple containers, but there are still plenty of workloads that could benefit from the having access control on databases for namespacing. So we'll move forward with this feature. |
hpatro
left a comment
There was a problem hiding this comment.
Thanks for working on this @JoBeR007.
High level comment:
- I would prefer us to come up with a symbol for DB, let's say
^and use that instead ofdb=as prefix to have the same experience as other Then it would look like,
All database access: ACL SETUSER alice ^*
Selective database access: ACL SETUSER bob ^1 ^2
- With DB I find providing both ALLOW and DENY would be helpful. Possibly extend it to other resources in a separate PR.
Restrict admin db access via (-^?):
Allow all except db 0: ACL SETUSER alice -^0
- For
FLUSHALLandFLUSHDBthe command can get accepted as ASYNC and while it's getting executed the permission could change. I think the correct behavior is to still process the command completely even if the permission has changed at a later point. (The PR has the correct behavior).
|
Thanks for review @hpatro! |
|
@dvkashapov Just to be transparent with you. This PR was opened at a time when we were working on the 9.0 release, and so we weren't going to merge this feature in to it. Now that the RC candidates are mostly done we can start reviewing this again for the next release. I think we'll have more time now to review it and make progress.
This is inconsistent with the rest of the ACL system. Today everything is positive grants. Lots of people think that if you do
From the end user perspective, the FLUSH always happens synchronously. The data is just freed async. I think this behavior is fine.
I think this is less readable than the original proposal, but it is more consistent. I don't know how strongly I feel.
|
- Implemented database permissions using `db=<id>`, `db!=<id>`, `alldbs`, `resetdbs` ACL rules - Added SELECTOR_FLAG_ALLDBS and SELECTOR_FLAG_DBLIST_NEGATED flags - Updated SELECT, MOVE, SWAPDB, FLUSHDB commands with CMD_CROSS_DB flag, FLUSHALL - CMD_ALL_DBS - Extended ACL checks with database access verification - Added ACL_DENIED_DB error type for database permission violations - Maintained backward compatibility with default access to all databases Signed-off-by: Daniil Kashapov <[email protected]>
Signed-off-by: Daniil Kashapov <[email protected]>
Signed-off-by: Daniil Kashapov <[email protected]>
|
@madolson @hpatro Hello! It would be awesome to merge this in 9.1! |
|
In the discussion, it came up that we are missing the following other database features:
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## unstable #2309 +/- ##
============================================
- Coverage 73.84% 73.79% -0.05%
============================================
Files 125 125
Lines 68898 69345 +447
============================================
+ Hits 50878 51174 +296
- Misses 18020 18171 +151
🚀 New features to boost your workflow:
|
|
The use cases we need to have clear syntax for.
@hpatro Will followup to try to help clarify the most consistent way to implement the syntax. There is one more open question, do we want to make the default secure by default. For Redis 7, we changed the default permissions of an ACL user from allchannels to nochannels. Do we want to do the same thing here? |
|
Awesome! This is really descriptive and answers a lot of questions. Proposed syntax is good, I'm OK with it! Answering your first message:
We can make an assumption that
I will take a look at MONITOR implementation, maybe we can make it per-db with default of all databases.
Yes, FLUSHALL needs all database access.
DBLIST would be useful when we will have named databases, yes.
I don't know much about channels right now, so I can't say how hard it would be to make them per db.
In terms of ACL for named databases, if we decide to make them like an alias for some dbid, then implementation won't change much. We would only need to resolve name to dbid.
This will be good for security but it will be a breaking change for all existing ACL's for all users, maybe this is too much. Important question: We're going with per-user permissions, not per-selector like currently in PR? |
|
We were just trying to think through in the weekly meeting what other gaps exists around databases support. @dvkashapov doesn't need to be worked upon regarding this change.
It's still per-selector, maybe the example suggested above brought in some ambiguity. |
|
I started an issue for dealing with ACL issues for search module. #2764 Seems like this work intersects with that, in particular, the current search code would not enforce these ACL capabilities and would cause security issues. |
But then db+=1,2,3 would only apply to root selector, and will not be useful for user with >1 selector, same for db-=1,2,3 case. Do we want to give users ability to edit individual selector? Then we would need to identify them with some id's and add ACL LIST-SELECTORS user. |
That has been the case with all other restrictions (commands / categories / channels). We allow the operation even if one selector succeeds. |
|
OK, then I suggest we focus on |
Signed-off-by: Daniil Kashapov <[email protected]>
hpatro
left a comment
There was a problem hiding this comment.
LGTM.
@dvkashapov could you benchmark with certain db permissions added to the default user and measure the performance change for SET/GET commands? I assume it will be a minimal drop in rps.
| } | ||
| } | ||
|
|
||
| sds getAclErrorMessage(int acl_res, user *user, struct serverCommand *cmd, sds errored_val, int verbose) { |
There was a problem hiding this comment.
No, it was to avoid a breaking change at some point. The dry-run needed to give some more information, but we didn't want to change the response.
Signed-off-by: Daniil Kashapov <[email protected]>
Signed-off-by: Daniil Kashapov <[email protected]>
Signed-off-by: Daniil Kashapov <[email protected]>
|
I appreciate the time and input from everyone who reviewed this, thank you! I addressed every review suggestions, PTAL. @hpatro @madolson while merging changes from unstable I discovered one problem: Lua now uses only ValkeyModule API and we wouldn't be able to do extraction of dbid from client, previously in script_lua.c we did something like this: Can we do that to preserve correct ACL checks inside scripts?
Definitely, I will do the benchmarking early next week! |
madolson
left a comment
There was a problem hiding this comment.
The code looks really good. I looked more deeply into a few things and have some minor suggestions, other than that it looks good!
My initial reaction is that seems fine. |
Signed-off-by: Daniil Kashapov <[email protected]>
Signed-off-by: Daniil Kashapov <[email protected]>
Signed-off-by: Daniil Kashapov <[email protected]>
Signed-off-by: Daniil Kashapov <[email protected]>
Benchmarks!To exaggerate the situation I used small data sizes, Configs and benchmarks
databases 128
save ""
appendonly no
rdbcompression no
activedefrag no
maxclients 1000
io-threads (1 or 9)
protected-mode no
hz 10
maxmemory 3gb
Results DB LEVEL ACL with 128 explicit databases listDB LEVEL ACL (user with explicit db list of 128 database IDs)
SET IO 1 P 1 D 16
Summary:
throughput summary: 62701.67 requests per second
latency summary (msec):
avg min p50 p95 p99 max
6.105 5.480 6.055 6.199 7.439 16.863
GET IO 1 P 1 D 16
Summary:
throughput summary: 63302.68 requests per second
latency summary (msec):
avg min p50 p95 p99 max
6.046 5.008 5.999 6.135 6.231 12.527
SET IO 1 P 10 D 16
Summary:
throughput summary: 472118.09 requests per second
latency summary (msec):
avg min p50 p95 p99 max
8.111 7.264 8.063 8.215 8.303 16.735
GET IO 1 P 10 D 16
Summary:
throughput summary: 518410.44 requests per second
latency summary (msec):
avg min p50 p95 p99 max
7.386 6.608 7.343 7.503 7.599 15.359
SET IO 9 P 1 D 16
Summary:
throughput summary: 467143.03 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.798 0.024 0.535 4.263 4.943 33.919
GET IO 9 P 1 D 16
Summary:
throughput summary: 464477.75 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.802 0.016 0.535 4.319 4.999 34.303
SET IO 9 P 10 D 16
Summary:
throughput summary: 1345698.00 requests per second
latency summary (msec):
avg min p50 p95 p99 max
2.839 0.032 1.999 6.231 10.783 39.551
GET IO 9 P 10 D 16
Summary:
throughput summary: 1975641.75 requests per second
latency summary (msec):
avg min p50 p95 p99 max
1.928 0.040 1.399 5.511 7.519 38.943 Results UNSTABLE with default userUNSTABLE (user with default permissions)
SET IO 1 P 1 D 16
Summary:
throughput summary: 61326.57 requests per second
latency summary (msec):
avg min p50 p95 p99 max
6.244 5.336 6.199 6.375 7.623 17.503
GET IO 1 P 1 D 16
Summary:
throughput summary: 65044.25 requests per second
latency summary (msec):
avg min p50 p95 p99 max
5.884 5.240 5.855 5.983 6.063 12.199
SET IO 1 P 10 D 16
Summary:
throughput summary: 476723.19 requests per second
latency summary (msec):
avg min p50 p95 p99 max
8.034 7.008 7.983 8.111 8.207 17.647
GET IO 1 P 10 D 16
Summary:
throughput summary: 521056.66 requests per second
latency summary (msec):
avg min p50 p95 p99 max
7.348 6.552 7.303 7.471 7.567 15.175
SET IO 9 P 1 D 16
Summary:
throughput summary: 470560.84 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.792 0.024 0.527 4.263 4.911 32.671
GET IO 9 P 1 D 16
Summary:
throughput summary: 452110.12 requests per second
latency summary (msec):
avg min p50 p95 p99 max
0.824 0.024 0.527 4.383 6.799 37.951
SET IO 9 P 10 D 16
Summary:
throughput summary: 1340728.38 requests per second
latency summary (msec):
avg min p50 p95 p99 max
2.849 0.048 1.983 6.231 10.895 39.007
GET IO 9 P 10 D 16
Summary:
throughput summary: 2006724.50 requests per second
latency summary (msec):
avg min p50 p95 p99 max
1.898 0.040 1.391 5.487 6.975 35.583
|
|
Benchmarking seems good to me. @madolson / @zuiderkwast I think we are good to merge this out. Any final thoughts ? |
The dbnum issue still seems open? |
Like I mentioned in the thread and offline, we can stick with the current approach and can relax the validation if users ask for it. |
|
Discussed the upper bound dbnum issue further with @madolson. It makes devs lives easier to keep it open. This allows ACL rules to be used across clusters with different dbnum value. Hence, we should remove the restriction for ACL rule to follow server.dbnum and keep these two decoupled. |
Signed-off-by: Daniil Kashapov <[email protected]>
|
Replaced check with |
|
@dvkashapov Could you submit a PR to update the ACL topic page https://valkey.io/topics/acl with this change? |
|
Thanks everyone for making this happen!
Yes, definitely. |
## API changes and user behavior: - [x] Default behavior for database access. Default is `alldbs` permissions. ### Database Permissions (`db=`) - [x] Accessing particular database ``` > ACL SETUSER test1 on +@ALL ~* resetdbs db=0,1 nopass "user test1 on nopass sanitize-payload ~* resetchannels db=0,1 +@ALL" ``` - [x] (Same behavior without usage of `resetdbs`) ``` > ACL SETUSER test1 on +@ALL ~* db=0,1 nopass "user test1 on nopass sanitize-payload ~* resetchannels db=0,1 +@ALL" ``` - [x] Multiple selector can be provided ``` > ACL SETUSER test1 on nopass (db=0,1 +@Write +select ~*) (db=2,3 +@READ +select ~*) "user test1 on nopass sanitize-payload resetchannels alldbs -@ALL (~* resetchannels db=0,1 -@ALL +@Write +select) (~* resetchannels db=2,3 -@ALL +@READ +select)" ``` - [x] Restricting special commands which access databases as part of the command. The user needs to have access to both the commands and db(s) part of the command to run these commands. 1. SWAPDB 2. SELECT 3. MOVE - (Select command would have went through for the source database). Have access for the target database. 4. COPY - [x] Restricting special commands which doesn't specify database number, however, accesses multiple databases. The user needs to have access to both the commands and all databases (`alldbs`) to run these commands. 1. FLUSHALL - Access all databases 2. CLUSTER commands that access all databases: - CANCELSLOTMIGRATIONS - MIGRATESLOTS - [x] New connection establishment behavior New client connection gets established to DB 0 by default. Authentication and authorisation are decoupled and the user can connect/authenticate and further perform `SELECT` or other operation that do not access keyspace. (Do we want to extend HELLO?) Alternative suggestion by @madolson: Extend `HELLO` command to pass the dbid to which the user should get connected after authentication if they have right set of permission. I think it will become a long poll for adoption. - [x] Observability Extend `ACL LOG` to log user which received denied permission error while accessing a database. - [x] Module API * Introduce module API `int VM_ACLCheckPermissions(ValkeyModuleUser *user, ValkeyModuleString **argv, int argc, int dbid, ValkeyModuleACLLogEntryReason *denial_reason);` * Stop support of `VM_ACLCheckCommandPermissions()`. Resolves: valkey-io#1336 --------- Signed-off-by: Daniil Kashapov <[email protected]>
## API changes and user behavior: - [x] Default behavior for database access. Default is `alldbs` permissions. ### Database Permissions (`db=`) - [x] Accessing particular database ``` > ACL SETUSER test1 on +@ALL ~* resetdbs db=0,1 nopass "user test1 on nopass sanitize-payload ~* resetchannels db=0,1 +@ALL" ``` - [x] (Same behavior without usage of `resetdbs`) ``` > ACL SETUSER test1 on +@ALL ~* db=0,1 nopass "user test1 on nopass sanitize-payload ~* resetchannels db=0,1 +@ALL" ``` - [x] Multiple selector can be provided ``` > ACL SETUSER test1 on nopass (db=0,1 +@Write +select ~*) (db=2,3 +@READ +select ~*) "user test1 on nopass sanitize-payload resetchannels alldbs -@ALL (~* resetchannels db=0,1 -@ALL +@Write +select) (~* resetchannels db=2,3 -@ALL +@READ +select)" ``` - [x] Restricting special commands which access databases as part of the command. The user needs to have access to both the commands and db(s) part of the command to run these commands. 1. SWAPDB 2. SELECT 3. MOVE - (Select command would have went through for the source database). Have access for the target database. 4. COPY - [x] Restricting special commands which doesn't specify database number, however, accesses multiple databases. The user needs to have access to both the commands and all databases (`alldbs`) to run these commands. 1. FLUSHALL - Access all databases 2. CLUSTER commands that access all databases: - CANCELSLOTMIGRATIONS - MIGRATESLOTS - [x] New connection establishment behavior New client connection gets established to DB 0 by default. Authentication and authorisation are decoupled and the user can connect/authenticate and further perform `SELECT` or other operation that do not access keyspace. (Do we want to extend HELLO?) Alternative suggestion by @madolson: Extend `HELLO` command to pass the dbid to which the user should get connected after authentication if they have right set of permission. I think it will become a long poll for adoption. - [x] Observability Extend `ACL LOG` to log user which received denied permission error while accessing a database. - [x] Module API * Introduce module API `int VM_ACLCheckPermissions(ValkeyModuleUser *user, ValkeyModuleString **argv, int argc, int dbid, ValkeyModuleACLLogEntryReason *denial_reason);` * Stop support of `VM_ACLCheckCommandPermissions()`. Resolves: valkey-io#1336 --------- Signed-off-by: Daniil Kashapov <[email protected]>
At one point in development there was assumption that intset of database IDs would have IDs only in `[0, server.dbnum-1]` range, but [here](#2309 (comment)) we decided to change upper bound to `INT32_MAX` for ACL compatibility reasons between nodes. Because of that change, assumption is not true anymore and we should explicitly check each database list to contain access to full `[0, server.dbnum-1]` range. Also added test for that. Signed-off-by: Daniil Kashapov <[email protected]>
At one point in development there was assumption that intset of database IDs would have IDs only in `[0, server.dbnum-1]` range, but [here](valkey-io#2309 (comment)) we decided to change upper bound to `INT32_MAX` for ACL compatibility reasons between nodes. Because of that change, assumption is not true anymore and we should explicitly check each database list to contain access to full `[0, server.dbnum-1]` range. Also added test for that. Signed-off-by: Daniil Kashapov <[email protected]>
API changes and user behavior:
Default is
alldbspermissions.Database Permissions (
db=)resetdbs)The user needs to have access to both the commands and db(s) part of the command to run these commands.
The user needs to have access to both the commands and all databases (
alldbs) to run these commands.New client connection gets established to DB 0 by default. Authentication and authorisation are decoupled and the user can connect/authenticate and further perform
SELECTor other operation that do not access keyspace.(Do we want to extend HELLO?) Alternative suggestion by @madolson: Extend
HELLOcommand to pass the dbid to which the user should get connected after authentication if they have right set of permission. I think it will become a long poll for adoption.Observability
Extend
ACL LOGto log user which received denied permission error while accessing a database.Module API
int VM_ACLCheckPermissions(ValkeyModuleUser *user, ValkeyModuleString **argv, int argc, int dbid, ValkeyModuleACLLogEntryReason *denial_reason);VM_ACLCheckCommandPermissions().Resolves: #1336