Prevent logging sensitive data in debug logs by sveinse · Pull Request #256 · custom-components/zaptec

sveinse · 2025-07-28T21:39:49Z

This is a proposal to fix #255. The PR adds a system for redaction of sensitive data. This is the same system previously used for the diagnostics download, which is now repurposed for all logging from api.py. The overall idea is that it should be safe to share the debug logs -- with the exception of the redaction database which is clearly marked in the log.

A log entry typically looks like this with all sensitive data redacted:

@@@  REQUEST GET to 'https://api.zaptec.com/api/installation'
@@@  RESPONSE 200 length 1118
   Installation <--Inst[8a106b]-->  (adding)
>>>  Adding   Installation.id (Id)  =  <str> '<--Inst[8a106b]-->'
>>>  Adding   Installation[8a106b].name (Name)  =  <str> '<--Redact #2-->'
>>>  Adding   Installation[8a106b].address (Address)  =  <str> '<--Redact #3-->'

The redacted stings are found in "<--Text-->" brackets. At the end of setup if REDACT_DUMP_ON_STARTUP is set, will print the following in the logs.

------  DO NOT PUBLISH THE FOLLOWING DATA  ------
|  Redaction database:
|  {'<--Inst[8a106b]-->': '...the actual value....',
|   '<--Redact #2-->': '...the actual value....',
|   '<--Redact #3-->': '...the actual value....'}
------  DO NOT PUBLISH THE ABOVE ^^^ DATA  ------

The setting REDACT_LOGS = False (in const.py) can be used to disable redaction completely. I've chosen to set REDACT_DUMP_ON_STARTUP = True so there is a chance to extract the actual values from the logs if need be. But users must make sure they don't share that part of the logs.

The downside to this solution, is that it adds even more complexity to the library and that we can never guarantee 100% that any sensitive information will not fall through the filter.

Is this PR a good idea, or just added complexity?

* All debug output doesn't print sensistive data in logs * Sensitive data are replaced by redaction strings * Redaction database can be printed to logs * Move Redactor to redact.py

steinmn

Redacting 0.0 definitely needs to be fixed, the other two are open for discussion.

Going forward (so not necessarily in this PR/release), would it make sense to gather all of the "access library"-code in a custom_components/zaptec/zaptec_api-folder? As a precursor to having it as a separate python package, I think it would be helpful to start keeping it a bit more separate and try to disentangle the ha-parts and api-parts.

steinmn · 2025-07-29T15:04:03Z

custom_components/zaptec/redact.py

+            return self.redacts[obj]
+
+        # Check if new redaction is needed
+        if key and key in self.REDACT_KEYS and (not obj or obj not in self.NEVER_REDACT):


What is the purpose of not obj here?

In my setup, latitude and longitude has not been set, leading them to become 0.0. I think that since not 0.0 is true/truthy, that overrides the NEVER_REDACT. At least latitude is the first instance I found of <--Redact #4-->, and all subsequent zero-floats in my debug logs are also redacted.

Yes, the or is incorrect here: It should read

if key and key in self.REDACT_KEYS and objand obj not in self.NEVER_REDACT:

Fixed, please retest. I didn't have any 0.0 items in my setup, so I missed this part.

This leads me into another topic we should also add on our issue list: Making a test suite. That should avoid mistakes like this. I've started a few times on the framework for pytest, but haven't really been able to commit fully. It does require some work getting up.

Looks good

Yeah, tests are good to have, and getting the framework for it up and running is often the biggest hurdle.

custom_components/zaptec/redact.py

steinmn · 2025-07-29T15:15:33Z

custom_components/zaptec/__init__.py

+    # Dump the redaction database
+    if REDACT_LOGS and REDACT_DUMP_ON_STARTUP:
+        _LOGGER.debug("------  DO NOT PUBLISH THE FOLLOWING DATA  ------")
+        _LOGGER.debug("|  Redaction database:")
+        for line in manager.zaptec.redact.dumps().splitlines():
+            _LOGGER.debug("|  %s", line)
+        _LOGGER.debug("------  DO NOT PUBLISH THE ABOVE ^^^ DATA  ------")


This is in an async method, can we trust that there won't be other lines inserted inbetween these? I think generating the full string first and then logging it in a single debug-call would be preferable.

Yes, this is an async function, and these are not async, so they should be atomic. Unless HA uses multithreaded asyncio, which is very exotic, I believe this will be logged together. I know HA does a lot of unorthodox trickeries, so I'll ask around.

Logging with newlines are frowned upon. However, they don't break up TB to ensure it stays together, so perhaps this is a single log entry.

I changed this to log into one operation.

steinmn · 2025-07-29T15:32:04Z

I think this PR is probably a necessary evil in terms of complexity. There's only so many reminders you can give users to be careful, most people will still ignore/forget it (and I include myself in "most people" here).

I think it would still be good to update the readme in https://github.com/custom-components/zaptec?tab=readme-ov-file#debugging with something like "while the integration strives to redact all sensitive data from the debug logs, you should always doublecheck that there is nothing sensitive in them before sharing them with strangers on the internet".

Some instructions on how to use the functions in DEVELOPMENT.md would also be nice.

sveinse · 2025-07-29T16:16:11Z

Redacting 0.0 definitely needs to be fixed, the other two are open for discussion.

Agree on the 0.0. That was a mistake.

Going forward (so not necessarily in this PR/release), would it make sense to gather all of the "access library"-code in a custom_components/zaptec/zaptec_api-folder? As a precursor to having it as a separate python package, I think it would be helpful to start keeping it a bit more separate and try to disentangle the ha-parts and api-parts.

Yes, I'm slowly working towards that. There are other things that should be changed on the HA site too, such as splitting up __init__.py into entity.py and coordinator.py. I'm thinking moving everything related to api.py into zaptec/. If we fork this off into a separate package it will be named "python-zaptec" or "zaptec" or some-such and having that directory name leads to from zaptec import Zaptec. But I don't want to embark on this on this side of the final v0.8 release. It will create a lot of diff noise (so git blame will be hard to follow). So this should be forked off to a issue of its own.

Edit: Created #257

sveinse · 2025-07-29T16:20:34Z

@steinmn There might be other alternatives to how this is solved in this PR. One of it is to make a special _LOGGER object which installs a redaction filter. This way most function in api.py and co and log as usual. The pro is that this make this a less intrusive method code wise. The con is that this approach will be much less surgical as one can only redact on substrings of whole line of log, losing the context of it. The risk of ending up with redacting common words like "0.0" is much higher.

steinmn · 2025-07-29T17:04:14Z

The risk of ending up with redacting common words like "0.0" is much higher.

Yeah, let's not tempt fate with that.

I think the code is ok as it is now, just do a revision of the readme and maybe add some instructions to development.md

sveinse · 2025-07-29T19:14:10Z

@steinmn I updated the documentation, both for readme as well as development. Please review.

steinmn

Just a couple of typos, LGTM

README.md

DEVELOPMENT.md

sveinse · 2025-07-29T19:27:34Z

@steinmn Thank you for the spelling corrections. I probably need to get myself a contextual spell checker in VS Code 😁

sveinse · 2025-07-29T20:23:49Z

Thank you once again @steinm for helping out.

sveinse · 2025-07-30T17:00:25Z

@steinmn do we want this and #254 in or out of v0.8.0 ?

steinmn · 2025-07-30T17:03:58Z

My vote is to include them👍

sveinse added 2 commits July 28, 2025 21:24

Implement sensitive data redation in debug log

69c781e

* All debug output doesn't print sensistive data in logs * Sensitive data are replaced by redaction strings * Redaction database can be printed to logs * Move Redactor to redact.py

Missing import

e77d0f3

sveinse changed the title ~~Feature add log redaction~~ Prevent logging sensitive data in debug logs Jul 28, 2025

steinmn suggested changes Jul 29, 2025

View reviewed changes

Fix redaction of falsey object and log database in one log entry

396052e

Update documentation

f868d1d

steinmn approved these changes Jul 29, 2025

View reviewed changes

README.md Outdated Show resolved Hide resolved

README.md Outdated Show resolved Hide resolved

DEVELOPMENT.md Outdated Show resolved Hide resolved

Updated spelling mistakes

9b07cdc

Fixup use of redact and repr

cce4c39

sveinse merged commit 7fb566e into custom-components:master Jul 29, 2025
3 checks passed

sveinse deleted the feature-add-log-redaction branch August 3, 2025 13:14

sveinse added this to the v0.8.0 milestone Aug 8, 2025

Comments

Conversation

sveinse commented Jul 28, 2025

Uh oh!

steinmn left a comment

Choose a reason for hiding this comment

Uh oh!

steinmn Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

sveinse Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

sveinse Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

steinmn Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

steinmn Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

sveinse Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

sveinse Jul 29, 2025

Choose a reason for hiding this comment

Uh oh!

steinmn commented Jul 29, 2025

Uh oh!

sveinse commented Jul 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sveinse commented Jul 29, 2025

Uh oh!

steinmn commented Jul 29, 2025

Uh oh!

sveinse commented Jul 29, 2025

Uh oh!

steinmn left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

sveinse commented Jul 29, 2025

Uh oh!

Uh oh!

sveinse commented Jul 29, 2025

Uh oh!

sveinse commented Jul 30, 2025

Uh oh!

steinmn commented Jul 30, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

sveinse commented Jul 29, 2025 •

edited

Loading