Skip to content

Query masking rules#5710

Merged
alexey-milovidov merged 9 commits intoClickHouse:masterfrom
filimonov:query_masking
Sep 3, 2019
Merged

Query masking rules#5710
alexey-milovidov merged 9 commits intoClickHouse:masterfrom
filimonov:query_masking

Conversation

@filimonov
Copy link
Copy Markdown
Contributor

I hereby agree to the terms of the CLA available at: https://yandex.ru/legal/cla/?lang=en

For changelog. Remove if this is non-significant change.

Category (leave one):

  • New Feature

Short description (up to few sentences):
Possibility to remove sensitive data from query_log, server logs, process list with regexp-based rules.

Detailed description (optional):
resolves #5527

@filimonov
Copy link
Copy Markdown
Contributor Author

filimonov commented Jun 21, 2019

I'll finish the PR at the beginning July. I would be appreciate for CR in the meanwhile.


if (!ends_with_backslash && (ends_with_semicolon || has_vertical_output_suffix || (!config().has("multiline") && !hasDataInSTDIN())))
{
// TODO: should we do sensitive data masking on client too? History file can be source of secret leaks.
Copy link
Copy Markdown
Member

@alexey-milovidov alexey-milovidov Jun 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's questionable. Because the history file is stored in user home directory that is not accessible by other users. But it's Ok if we will apply masking.

uint64_t getMatchesCount() const { return matches_count; }
};

std::vector<std::unique_ptr<MaskingRule>> all_masking_rules;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the MaskingRule struct non-moveable?
Otherwise you don't need unique_ptr.

Copy link
Copy Markdown
Contributor Author

@filimonov filimonov Jun 28, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE2 object containing precompiled regexp is non-movable. :/ and because if that whole structure isn't movable too.

# and finally querylog
$CLICKHOUSE_CLIENT \
--query="select * from system.query_log where event_time>now() - 10 and query like '%TOPSECRET%';"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please don't forget to check server's text log manually for HTTP and TCP interfaces.

@alexey-milovidov
Copy link
Copy Markdown
Member

Mostly Ok, only a few changes remain.

@alexey-milovidov
Copy link
Copy Markdown
Member

TODO:
It must work without query_masking_rules section in config (e.g. config file from old server).
Make method wipeSensitiveData const; use const QueryMaskingRules in Context; return by const reference (or pointer?). Initialize eagerly, because it cannot be changed after startup, avoid getLock call.

@alexey-milovidov
Copy link
Copy Markdown
Member

Discussing in Telegram chat...

@filimonov filimonov changed the title Query masking rules [wip] Query masking rules Jul 16, 2019

if (auto masker = context.getSensitiveDataMasker())
{
auto matches = masker->wipeSensitiveData(res);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not that happens in 2 places - once here (for whole query - uses for system.query_log. system.processes and for logging), and second - in loggers. ProfileEvents are incremented only here.

@filimonov
Copy link
Copy Markdown
Contributor Author

More potential sources of leaks: system.mutations (query), system.zookeeper, system.tables (create table statements with passwords). Also (theoretically) last exception filed in dictionaries / merges / replication_queue?

Something else?

@filimonov filimonov changed the title [wip] Query masking rules Query masking rules Jul 19, 2019
@zhang2014
Copy link
Copy Markdown
Contributor

zhang2014 commented Aug 6, 2019

@filimonov Can this achieve #6240 ? Maybe I should turn off #6240 If this can, because this is more generic : )

BTW: If so, how does it match the password? Do I need to configure the password in the configuration file ?

@filimonov
Copy link
Copy Markdown
Contributor Author

Currently that masking is applied only on logs and when putting query to query_log and process list, but can easely be extended to more places.

And yes, you need to put password, or some good enough regexp to match it in config file.


## query_masking_rules

Regexp-based rules, which will be applied to queries before storing them in server logs,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the code it looks like the masking is applied not only to the query but also to the entire text of the log message. This should be documented as well.

Besides the query text, what are other sources of sensitive data that might end up in the logs? Probably we should have a list here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Query fragments (for example "condition secret = 1 moved to prewhere")
  2. small data fragments can be dumped when input format can't be parsed (for example CSV)
  3. filenames (which can contain secrets, for example for Distributed table)
  4. urls & connection settings
  5. external error messages (smth like 'can't establish connection to DSN xyz://db?user=www&password')
    etc.

const RE2 regexp;
const re2::StringPiece replacement;

mutable std::atomic<std::uint64_t> matches_count = 0;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove this? QueryMaskingRulesMatch should be sufficient to check whether the rules are working.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is was used in unit tests, can hide that in #ifdef


void Context::setSensitiveDataMasker(std::unique_ptr<SensitiveDataMasker> sensitive_data_masker)
{
if (sensitive_data_masker->rulesCount() > 0)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No check for nullptr.

SensitiveDataMasker(const Poco::Util::AbstractConfiguration & config, const std::string & config_prefix);
~SensitiveDataMasker();
void addMaskingRule(const std::string & name, const std::string & regexp_string, const std::string & replacement_string);
int wipeSensitiveData(std::string & data) const;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why int? Missing comment.

@alexey-milovidov alexey-milovidov merged commit cdf98cc into ClickHouse:master Sep 3, 2019
@alexey-milovidov
Copy link
Copy Markdown
Member

9959e8d
843f830
ece7a05
6176041

abyss7 added a commit to abyss7/ClickHouse that referenced this pull request Sep 4, 2019
@KochetovNicolai KochetovNicolai added the pr-feature Pull request with new product feature label Sep 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add cleansing rules to prevent SQL queries from leaking sensitive data into log

5 participants