-
Notifications
You must be signed in to change notification settings - Fork 8.3k
Add cleansing rules to prevent SQL queries from leaking sensitive data into log #5527
Description
Use case
Users who want to protect sensitive data in ClickHouse need a way to prevent data from accidentally leaking through logs. One way that this can happen is through queries logged in the clickhouse-system.log or system.query_log table. If messages from either of these are extracted to Splunk or similar systems, sensitive data like credit card numbers use in query expressions may leak out and be exposed to people who should not see them.
Describe the solution you'd like
I would like to add a feature to apply apply cleansing rules to any query that ClickHouse logs. It would work as follows.
- Add a new property in /etc/clickhouse-server.xml to define regular expressions to seek in queries when logging. Here's an example to match US social security numbers.
<query_cleansing_rules>
<regexp>[0-9]{3}-[0-9]{2}-[0-9]{4}</regexp>
</query_cleansing_rules>
- Whenever a query is logged, run the foregoing regular expressions on the query string. Any matching strings would be replaced by a standard value like '********'.
This requires simple changes in just a couple of locations to implement (notably executeQuery.cpp and perhaps BaseDaemon.cpp)
Describe alternatives you've considered
I looked for a more general way to do this, for example by making setting changes in Poco log configuration. However there does not appear to be a centralized way to do this, and Poco anyway is not used for logging to system.query_log. Also running regexp on large numbers of messages would impact server performance.
Additional context
This feature is very useful for achieving PCI-DSS compliance (standard for handling credit card data).
Outstanding Questions
Are there other ways data can leak out into the logs?