Query masking rules#5710
Conversation
|
I'll finish the PR at the beginning July. I would be appreciate for CR in the meanwhile. |
|
|
||
| if (!ends_with_backslash && (ends_with_semicolon || has_vertical_output_suffix || (!config().has("multiline") && !hasDataInSTDIN()))) | ||
| { | ||
| // TODO: should we do sensitive data masking on client too? History file can be source of secret leaks. |
There was a problem hiding this comment.
It's questionable. Because the history file is stored in user home directory that is not accessible by other users. But it's Ok if we will apply masking.
| uint64_t getMatchesCount() const { return matches_count; } | ||
| }; | ||
|
|
||
| std::vector<std::unique_ptr<MaskingRule>> all_masking_rules; |
There was a problem hiding this comment.
Is the MaskingRule struct non-moveable?
Otherwise you don't need unique_ptr.
There was a problem hiding this comment.
RE2 object containing precompiled regexp is non-movable. :/ and because if that whole structure isn't movable too.
| # and finally querylog | ||
| $CLICKHOUSE_CLIENT \ | ||
| --query="select * from system.query_log where event_time>now() - 10 and query like '%TOPSECRET%';" | ||
|
|
There was a problem hiding this comment.
Also please don't forget to check server's text log manually for HTTP and TCP interfaces.
|
Mostly Ok, only a few changes remain. |
|
TODO: |
|
Discussing in Telegram chat... |
|
|
||
| if (auto masker = context.getSensitiveDataMasker()) | ||
| { | ||
| auto matches = masker->wipeSensitiveData(res); |
There was a problem hiding this comment.
not that happens in 2 places - once here (for whole query - uses for system.query_log. system.processes and for logging), and second - in loggers. ProfileEvents are incremented only here.
|
More potential sources of leaks: system.mutations (query), system.zookeeper, system.tables (create table statements with passwords). Also (theoretically) last exception filed in dictionaries / merges / replication_queue? Something else? |
|
@filimonov Can this achieve #6240 ? Maybe I should turn off #6240 If this can, because this is more generic : ) BTW: If so, how does it match the password? Do I need to configure the password in the configuration file ? |
|
Currently that masking is applied only on logs and when putting query to query_log and process list, but can easely be extended to more places. And yes, you need to put password, or some good enough regexp to match it in config file. |
|
|
||
| ## query_masking_rules | ||
|
|
||
| Regexp-based rules, which will be applied to queries before storing them in server logs, |
There was a problem hiding this comment.
From the code it looks like the masking is applied not only to the query but also to the entire text of the log message. This should be documented as well.
Besides the query text, what are other sources of sensitive data that might end up in the logs? Probably we should have a list here.
There was a problem hiding this comment.
- Query fragments (for example "condition secret = 1 moved to prewhere")
- small data fragments can be dumped when input format can't be parsed (for example CSV)
- filenames (which can contain secrets, for example for Distributed table)
- urls & connection settings
- external error messages (smth like 'can't establish connection to DSN xyz://db?user=www&password')
etc.
| const RE2 regexp; | ||
| const re2::StringPiece replacement; | ||
|
|
||
| mutable std::atomic<std::uint64_t> matches_count = 0; |
There was a problem hiding this comment.
Can we remove this? QueryMaskingRulesMatch should be sufficient to check whether the rules are working.
There was a problem hiding this comment.
Is was used in unit tests, can hide that in #ifdef
|
|
||
| void Context::setSensitiveDataMasker(std::unique_ptr<SensitiveDataMasker> sensitive_data_masker) | ||
| { | ||
| if (sensitive_data_masker->rulesCount() > 0) |
| SensitiveDataMasker(const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix); | ||
| ~SensitiveDataMasker(); | ||
| void addMaskingRule(const std::string & name, const std::string & regexp_string, const std::string & replacement_string); | ||
| int wipeSensitiveData(std::string & data) const; |
There was a problem hiding this comment.
Why int? Missing comment.
I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en
For changelog. Remove if this is non-significant change.
Category (leave one):
Short description (up to few sentences):
Possibility to remove sensitive data from query_log, server logs, process list with regexp-based rules.
Detailed description (optional):
resolves #5527