Prevent logging sensitive data in debug logs#256
Prevent logging sensitive data in debug logs#256sveinse merged 6 commits intocustom-components:masterfrom
Conversation
* All debug output doesn't print sensistive data in logs * Sensitive data are replaced by redaction strings * Redaction database can be printed to logs * Move Redactor to redact.py
steinmn
left a comment
There was a problem hiding this comment.
Redacting 0.0 definitely needs to be fixed, the other two are open for discussion.
Going forward (so not necessarily in this PR/release), would it make sense to gather all of the "access library"-code in a custom_components/zaptec/zaptec_api-folder? As a precursor to having it as a separate python package, I think it would be helpful to start keeping it a bit more separate and try to disentangle the ha-parts and api-parts.
custom_components/zaptec/redact.py
Outdated
| return self.redacts[obj] | ||
|
|
||
| # Check if new redaction is needed | ||
| if key and key in self.REDACT_KEYS and (not obj or obj not in self.NEVER_REDACT): |
There was a problem hiding this comment.
What is the purpose of not obj here?
In my setup, latitude and longitude has not been set, leading them to become 0.0. I think that since not 0.0 is true/truthy, that overrides the NEVER_REDACT. At least latitude is the first instance I found of <--Redact #4-->, and all subsequent zero-floats in my debug logs are also redacted.
There was a problem hiding this comment.
Yes, the or is incorrect here: It should read
if key and key in self.REDACT_KEYS and objand obj not in self.NEVER_REDACT:There was a problem hiding this comment.
Fixed, please retest. I didn't have any 0.0 items in my setup, so I missed this part.
This leads me into another topic we should also add on our issue list: Making a test suite. That should avoid mistakes like this. I've started a few times on the framework for pytest, but haven't really been able to commit fully. It does require some work getting up.
There was a problem hiding this comment.
Looks good
Yeah, tests are good to have, and getting the framework for it up and running is often the biggest hurdle.
custom_components/zaptec/__init__.py
Outdated
| # Dump the redaction database | ||
| if REDACT_LOGS and REDACT_DUMP_ON_STARTUP: | ||
| _LOGGER.debug("------ DO NOT PUBLISH THE FOLLOWING DATA ------") | ||
| _LOGGER.debug("| Redaction database:") | ||
| for line in manager.zaptec.redact.dumps().splitlines(): | ||
| _LOGGER.debug("| %s", line) | ||
| _LOGGER.debug("------ DO NOT PUBLISH THE ABOVE ^^^ DATA ------") |
There was a problem hiding this comment.
This is in an async method, can we trust that there won't be other lines inserted inbetween these? I think generating the full string first and then logging it in a single debug-call would be preferable.
There was a problem hiding this comment.
Yes, this is an async function, and these are not async, so they should be atomic. Unless HA uses multithreaded asyncio, which is very exotic, I believe this will be logged together. I know HA does a lot of unorthodox trickeries, so I'll ask around.
Logging with newlines are frowned upon. However, they don't break up TB to ensure it stays together, so perhaps this is a single log entry.
There was a problem hiding this comment.
I changed this to log into one operation.
|
I think this PR is probably a necessary evil in terms of complexity. There's only so many reminders you can give users to be careful, most people will still ignore/forget it (and I include myself in "most people" here). I think it would still be good to update the readme in https://github.com/custom-components/zaptec?tab=readme-ov-file#debugging with something like "while the integration strives to redact all sensitive data from the debug logs, you should always doublecheck that there is nothing sensitive in them before sharing them with strangers on the internet". Some instructions on how to use the functions in DEVELOPMENT.md would also be nice. |
Agree on the 0.0. That was a mistake.
Yes, I'm slowly working towards that. There are other things that should be changed on the HA site too, such as splitting up Edit: Created #257 |
|
@steinmn There might be other alternatives to how this is solved in this PR. One of it is to make a special |
Yeah, let's not tempt fate with that. I think the code is ok as it is now, just do a revision of the readme and maybe add some instructions to development.md |
|
@steinmn I updated the documentation, both for readme as well as development. Please review. |
steinmn
left a comment
There was a problem hiding this comment.
Just a couple of typos, LGTM
|
@steinmn Thank you for the spelling corrections. I probably need to get myself a contextual spell checker in VS Code 😁 |
|
Thank you once again @steinm for helping out. |
|
My vote is to include them👍 |
This is a proposal to fix #255. The PR adds a system for redaction of sensitive data. This is the same system previously used for the diagnostics download, which is now repurposed for all logging from api.py. The overall idea is that it should be safe to share the debug logs -- with the exception of the redaction database which is clearly marked in the log.
A log entry typically looks like this with all sensitive data redacted:
The redacted stings are found in
"<--Text-->"brackets. At the end of setup ifREDACT_DUMP_ON_STARTUPis set, will print the following in the logs.The setting
REDACT_LOGS = False(in const.py) can be used to disable redaction completely. I've chosen to setREDACT_DUMP_ON_STARTUP = Trueso there is a chance to extract the actual values from the logs if need be. But users must make sure they don't share that part of the logs.The downside to this solution, is that it adds even more complexity to the library and that we can never guarantee 100% that any sensitive information will not fall through the filter.
Is this PR a good idea, or just added complexity?