Skip to content

Comments

Prevent logging sensitive data in debug logs#256

Merged
sveinse merged 6 commits intocustom-components:masterfrom
sveinse:feature-add-log-redaction
Jul 29, 2025
Merged

Prevent logging sensitive data in debug logs#256
sveinse merged 6 commits intocustom-components:masterfrom
sveinse:feature-add-log-redaction

Conversation

@sveinse
Copy link
Collaborator

@sveinse sveinse commented Jul 28, 2025

This is a proposal to fix #255. The PR adds a system for redaction of sensitive data. This is the same system previously used for the diagnostics download, which is now repurposed for all logging from api.py. The overall idea is that it should be safe to share the debug logs -- with the exception of the redaction database which is clearly marked in the log.

A log entry typically looks like this with all sensitive data redacted:

@@@  REQUEST GET to 'https://api.zaptec.com/api/installation'
@@@  RESPONSE 200 length 1118
   Installation <--Inst[8a106b]-->  (adding)
>>>  Adding   Installation.id (Id)  =  <str> '<--Inst[8a106b]-->'
>>>  Adding   Installation[8a106b].name (Name)  =  <str> '<--Redact #2-->'
>>>  Adding   Installation[8a106b].address (Address)  =  <str> '<--Redact #3-->'

The redacted stings are found in "<--Text-->" brackets. At the end of setup if REDACT_DUMP_ON_STARTUP is set, will print the following in the logs.

------  DO NOT PUBLISH THE FOLLOWING DATA  ------
|  Redaction database:
|  {'<--Inst[8a106b]-->': '...the actual value....',
|   '<--Redact #2-->': '...the actual value....',
|   '<--Redact #3-->': '...the actual value....'}
------  DO NOT PUBLISH THE ABOVE ^^^ DATA  ------

The setting REDACT_LOGS = False (in const.py) can be used to disable redaction completely. I've chosen to set REDACT_DUMP_ON_STARTUP = True so there is a chance to extract the actual values from the logs if need be. But users must make sure they don't share that part of the logs.

The downside to this solution, is that it adds even more complexity to the library and that we can never guarantee 100% that any sensitive information will not fall through the filter.

Is this PR a good idea, or just added complexity?

sveinse added 2 commits July 28, 2025 21:24
* All debug output doesn't print sensistive data in logs
* Sensitive data are replaced by redaction strings
* Redaction database can be printed to logs
* Move Redactor to redact.py
@sveinse sveinse changed the title Feature add log redaction Prevent logging sensitive data in debug logs Jul 28, 2025
Copy link
Contributor

@steinmn steinmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Redacting 0.0 definitely needs to be fixed, the other two are open for discussion.

Going forward (so not necessarily in this PR/release), would it make sense to gather all of the "access library"-code in a custom_components/zaptec/zaptec_api-folder? As a precursor to having it as a separate python package, I think it would be helpful to start keeping it a bit more separate and try to disentangle the ha-parts and api-parts.

return self.redacts[obj]

# Check if new redaction is needed
if key and key in self.REDACT_KEYS and (not obj or obj not in self.NEVER_REDACT):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the purpose of not obj here?

In my setup, latitude and longitude has not been set, leading them to become 0.0. I think that since not 0.0 is true/truthy, that overrides the NEVER_REDACT. At least latitude is the first instance I found of <--Redact #4-->, and all subsequent zero-floats in my debug logs are also redacted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the or is incorrect here: It should read

if key and key in self.REDACT_KEYS and objand obj not in self.NEVER_REDACT:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, please retest. I didn't have any 0.0 items in my setup, so I missed this part.

This leads me into another topic we should also add on our issue list: Making a test suite. That should avoid mistakes like this. I've started a few times on the framework for pytest, but haven't really been able to commit fully. It does require some work getting up.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

Yeah, tests are good to have, and getting the framework for it up and running is often the biggest hurdle.

Comment on lines 230 to 236
# Dump the redaction database
if REDACT_LOGS and REDACT_DUMP_ON_STARTUP:
_LOGGER.debug("------ DO NOT PUBLISH THE FOLLOWING DATA ------")
_LOGGER.debug("| Redaction database:")
for line in manager.zaptec.redact.dumps().splitlines():
_LOGGER.debug("| %s", line)
_LOGGER.debug("------ DO NOT PUBLISH THE ABOVE ^^^ DATA ------")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is in an async method, can we trust that there won't be other lines inserted inbetween these? I think generating the full string first and then logging it in a single debug-call would be preferable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is an async function, and these are not async, so they should be atomic. Unless HA uses multithreaded asyncio, which is very exotic, I believe this will be logged together. I know HA does a lot of unorthodox trickeries, so I'll ask around.

Logging with newlines are frowned upon. However, they don't break up TB to ensure it stays together, so perhaps this is a single log entry.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to log into one operation.

@steinmn
Copy link
Contributor

steinmn commented Jul 29, 2025

I think this PR is probably a necessary evil in terms of complexity. There's only so many reminders you can give users to be careful, most people will still ignore/forget it (and I include myself in "most people" here).

I think it would still be good to update the readme in https://github.com/custom-components/zaptec?tab=readme-ov-file#debugging with something like "while the integration strives to redact all sensitive data from the debug logs, you should always doublecheck that there is nothing sensitive in them before sharing them with strangers on the internet".

Some instructions on how to use the functions in DEVELOPMENT.md would also be nice.

@sveinse
Copy link
Collaborator Author

sveinse commented Jul 29, 2025

Redacting 0.0 definitely needs to be fixed, the other two are open for discussion.

Agree on the 0.0. That was a mistake.

Going forward (so not necessarily in this PR/release), would it make sense to gather all of the "access library"-code in a custom_components/zaptec/zaptec_api-folder? As a precursor to having it as a separate python package, I think it would be helpful to start keeping it a bit more separate and try to disentangle the ha-parts and api-parts.

Yes, I'm slowly working towards that. There are other things that should be changed on the HA site too, such as splitting up __init__.py into entity.py and coordinator.py. I'm thinking moving everything related to api.py into zaptec/. If we fork this off into a separate package it will be named "python-zaptec" or "zaptec" or some-such and having that directory name leads to from zaptec import Zaptec. But I don't want to embark on this on this side of the final v0.8 release. It will create a lot of diff noise (so git blame will be hard to follow). So this should be forked off to a issue of its own.

Edit: Created #257

@sveinse
Copy link
Collaborator Author

sveinse commented Jul 29, 2025

@steinmn There might be other alternatives to how this is solved in this PR. One of it is to make a special _LOGGER object which installs a redaction filter. This way most function in api.py and co and log as usual. The pro is that this make this a less intrusive method code wise. The con is that this approach will be much less surgical as one can only redact on substrings of whole line of log, losing the context of it. The risk of ending up with redacting common words like "0.0" is much higher.

@steinmn
Copy link
Contributor

steinmn commented Jul 29, 2025

The risk of ending up with redacting common words like "0.0" is much higher.

Yeah, let's not tempt fate with that.

I think the code is ok as it is now, just do a revision of the readme and maybe add some instructions to development.md

@sveinse
Copy link
Collaborator Author

sveinse commented Jul 29, 2025

@steinmn I updated the documentation, both for readme as well as development. Please review.

Copy link
Contributor

@steinmn steinmn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple of typos, LGTM

@sveinse
Copy link
Collaborator Author

sveinse commented Jul 29, 2025

@steinmn Thank you for the spelling corrections. I probably need to get myself a contextual spell checker in VS Code 😁

@sveinse sveinse merged commit 7fb566e into custom-components:master Jul 29, 2025
3 checks passed
@sveinse
Copy link
Collaborator Author

sveinse commented Jul 29, 2025

Thank you once again @steinm for helping out.

@sveinse
Copy link
Collaborator Author

sveinse commented Jul 30, 2025

@steinmn do we want this and #254 in or out of v0.8.0 ?

@steinmn
Copy link
Contributor

steinmn commented Jul 30, 2025

My vote is to include them👍

@sveinse sveinse deleted the feature-add-log-redaction branch August 3, 2025 13:14
@sveinse sveinse added this to the v0.8.0 milestone Aug 8, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prevent sharing of sensitive data in debug logs

2 participants