Conversation
|
This is from AWS, and I'll add some more context about the intention. We wanted to make sure that user data wasn't getting stored into log files, which has use cases around data sovereignty and security. The idea is that we should be able to produce a clean log file that can be stored and moved and only contain server information. Some thoughts about this:
|
I agree. We should start by coming up with a list of those places and what could fall under this definition (e.g. client names, file names, |
Just adding on ACL users and any command arguments or partial query information (we dump that in a couple of places) since we can't know their content. @oranagra Do you have any thoughts about providing some way to sanitize user data out of log files? |
src/networking.c
Outdated
There was a problem hiding this comment.
Why should we hide the client name?
There was a problem hiding this comment.
client name can hold sensitive information about the client purpose/ what the client does
There was a problem hiding this comment.
regarding client-list, I hide clients names because it is printed inside crash reports.
The client names will not be hidden when calling client-list directly.
ok, now i understand that change better (had difficulties wrapping my head around the code).
i think the solution of using isBugReportStart and shouldHideClientInfo is ugly, and instead, it would be much nicer to just add a hide_user_info argument to catClientInfoString and getAllClientsInfoString.
the problem with that is that catClientInfoString is widely used, and we don't wanna modify all the calls.
the solution to that is to add an alias function, i.e. catClientInfoStringGeneric will take the argument and the existing catClientInfoString will just be a wrapper.
|
I agree regarding the extended effort needed in this case. I think we should provide some sort of mask-pii mode for redis so it will avoid reporting internal data (key names, values, client names, addresses etc... as they might hold some info regarding customer application) but we will also need to provide the customer way to get that data in case he needs it (like turning off the pii masking per connection?) |
Customer will be able to watch those details by disabling Independent parser to run over logs and omit client parts is a complex alternative. It gets strings as input while in the config case we have the full logical context. |
you mean that admin would be able to do multi; config set hide-client-info no; client list; config set hide-client-info yes; exec? |
|
Not a fan of the idea, but It could be an acl user flag. But I still don't understand why we want to filter client list. Unlike the crash log, it's just a command, like KEYS or GET, and can be blocked by acl |
It is true that it can be blocked, but in cases were Redis is provided as a service sometimes clients would like to prevent operators from accessing private data while the operators would still need some tools in order to debug/manage the service .commands like slowlog, info client list etc... can be used by both operator and clients while having this flag can help reduce the potential pii leakage |
|
@ranshid I think we need to consider the use case you describe as a separate feature. Preventing PII from getting into log files is one thing. Removing PII from the output of administrative commands is another thing - it takes us to multi-level admin privileges, which is a deeper rabbit hole. |
I agree that some deeper thought should be placed to this, I was only pointing out some use cases which we encountered in AWS. As said it is possible to use some external tool to remove PII from logs and command outputs, but in such cases customers cannot realy validate the efficiency of such tool so having a redis built-in mechanism can help build trust with external users. |
src/config.c
Outdated
There was a problem hiding this comment.
i think the name of this config is not clear enough.
if it keeps it's current purpose, it should say "log".
alternatively we need to design where this is going before making any changes.
There was a problem hiding this comment.
p.s. documentation in redis.conf is missing.
src/debug.c
Outdated
src/networking.c
Outdated
There was a problem hiding this comment.
hide_user_info is misleading, it could mean that we hide the ACL user name.
actually, i'll argue that again that i don't see why CLIENT SETNAME is PII, and if it is, then the ACL user name could be too.
There was a problem hiding this comment.
Ack, client name will not be redacted
src/networking.c
Outdated
There was a problem hiding this comment.
maybe it should say --redacted-- instead of remain empty?
src/debug.c
Outdated
There was a problem hiding this comment.
if we hide the arguments, let's at least print the argc instead, and maybe a bold "redacted" message.
|
we discussed this in a core-team meeting and concluded that we would like to proceed only after we prepare some detailed list of everything we could want to redact (maybe host names and other things). |
|
Ack, I will create a draft list for discussion |
|
Those are the points I found which we might spill to logs some PII.
Please let me know if I missed something |
|
I don't think SLOWLOG is part of it, if it did then MONITOR and CLIENT LIST are too. |
|
I agree regarding |
|
it doesn't log the stack "trace". it logs the stack contents (could contain variables) |
I would be inclined to say function names count and file names do not. Typically file names are administrator, and we can ask them to set it to something not related to user data. |
|
Updated list
|
|
Hide replication backlog & stack frame and fix comments |
e848b0c to
2617412
Compare
|
@naglera Are you abandoning this effort? |
|
No, horrible git accident, I started this commit from unstable, and it closed automatically when I pulled from upstream. By the way, @madolson, are you also approving the list? #11747 (comment) |
|
@naglera Yeah, that looked good to me :) |
fix spaces Co-authored-by: debing.sun <[email protected]>
fix spaces Co-authored-by: debing.sun <[email protected]>
fix log Co-authored-by: debing.sun <[email protected]>
|
Hi @sundb, I'm not sure if we actually use |
| if (j >= clientArgsToLog(c)){ | ||
| serverLog(LL_WARNING|LL_RAW,"client->argv[%d]: *redacted*\n", j); |
There was a problem hiding this comment.
| if (j >= clientArgsToLog(c)){ | |
| serverLog(LL_WARNING|LL_RAW,"client->argv[%d]: *redacted*\n", j); | |
| if (j >= clientArgsToLog(c)) { | |
| serverLog(LL_WARNING|LL_RAW,"client->argv[%d]: *redacted*", j); |
| serverLog(LL_WARNING|LL_RAW,"argc: '%d'\n", cc->argc); | ||
| for (j = 0; j < cc->argc; j++) { | ||
| if (j >= clientArgsToLog(cc)){ | ||
| serverLog(LL_WARNING|LL_RAW,"argv[%d]: *redacted*\n", j); |
There was a problem hiding this comment.
| serverLog(LL_WARNING|LL_RAW,"argv[%d]: *redacted*\n", j); | |
| serverLog(LL_WARNING|LL_RAW,"argv[%d]: *redacted*", j); |
|
|
This PR is based on the commits from PR #11747. In the event of an assertion failure, hide command arguments from the operator. In some cases, private client information can be voluntarily exposed when a redis instance crashes due to an assertion failure. This commit prevent וnintentional client info exposure. Operators can still access the hidden data, but they must actively request it. Any of the client info commands remains the unchanged. ### Config Add a new config `hide-user-data-from-log` to turn this feature on and off, default off. --------- Co-authored-by: naglera <[email protected]> Co-authored-by: naglera <[email protected]>
|
Close via #13400 |
This PR is based on the commits from PR redis#11747. In the event of an assertion failure, hide command arguments from the operator. In some cases, private client information can be voluntarily exposed when a redis instance crashes due to an assertion failure. This commit prevent וnintentional client info exposure. Operators can still access the hidden data, but they must actively request it. Any of the client info commands remains the unchanged. ### Config Add a new config `hide-user-data-from-log` to turn this feature on and off, default off. --------- Co-authored-by: naglera <[email protected]> Co-authored-by: naglera <[email protected]>
This PR continues the work from [#13400](#13400), following the discussion in [#11747](#11747 (comment)), to further ensure sensitive user data is not exposed in logs when hide_user_data_from_log is enabled. - Introduce redactLogCstr() helper for safe, centralized log redaction. - Update ACL and networking log messages to use redacted values where appropriate. - Prevent leaking raw query buffer contents.
In the event of an assertion failure, hide command arguments from the operator.
In some cases, private client information can be voluntarily exposed when a redis instance crashes due to an assertion failure.
This commit prevent וnintentional client info exposure.
Operators can still access the hidden data, but they must actively request it.
Any of the client info commands remains the unchanged.