Skip to content

Conversation

@poettering
Copy link
Member

No description provided.

@poettering
Copy link
Member Author

Force pushed new version with @dnicolodi's point fixed.

@vcaputo
Copy link
Member

vcaputo commented Jun 8, 2020

I only skimmed the changes and nothing looked obviously terrible. I don't like suggesting rotations if they can just be done automagically when appropriate, but didn't spend enough time groking the changes to appreciate why it might make the most sense.

@poettering
Copy link
Member Author

Force pushed new version, only changes are the suggested ones. PTAL.

error(p, "Objected with double compression");
if (!!(o->object.flags & OBJECT_COMPRESSED_XZ) +
!!(o->object.flags & OBJECT_COMPRESSED_LZ4) +
!!(o->object.flags & OBJECT_COMPRESSED_ZSTD) > 1) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could use o->object.flags & OBJECT_COMPRESSION_MASK && !isPow2(o->object.flags & OBJECT_COMPRESSION_MASK).
I believe you have such a function or macro, if not:
bool isPow2(unsigned uVal) { return (uVal & ~(uVal - 1)) == uVal; }

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is that really more readable though? not convinced...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No contest if you add a function/macro for it.

static bool has_one_bit_set(unsigned long uVal) {
	return uVal && (uVal & ~(uVal - 1)) == uVal;
}

if(!has_one_bit_set(o->object.flags & OBJECT_COMPRESSION_MASK))
	error...

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean, we could use __builtin_popcount() too for this if we wanted, but there too I am not sure it's that more readable

poettering added a commit to poettering/systemd that referenced this pull request Jun 24, 2020
let's simplify the checks for ZSTD/LZ4/XZ

As suggested:

systemd#16096 (comment)
@poettering
Copy link
Member Author

Force pushed a new version, with (most) points addressed, see above. ptal.

poettering added a commit to poettering/systemd that referenced this pull request Jun 24, 2020
let's simplify the checks for ZSTD/LZ4/XZ

As suggested:

systemd#16096 (comment)
poettering added a commit to poettering/systemd that referenced this pull request Jun 24, 2020
let's simplify the checks for ZSTD/LZ4/XZ

As suggested:

systemd#16096 (comment)
poettering added a commit to poettering/systemd that referenced this pull request Jun 25, 2020
let's simplify the checks for ZSTD/LZ4/XZ

As suggested:

systemd#16096 (comment)
@poettering
Copy link
Member Author

Force pushed a new version with the suggested changes

@vuzdemav
Copy link
Contributor

vuzdemav commented Jun 25, 2020

I'm trying to wrap my head around how effective this mitigation is.
It works fine in the case of systemd-journal-remote. So for that alone, it seems worth merging.
OTOH, local mitigation does not seem to be effective.
A local user who has access to journal files can read the hash off the file and produce output that generates hash table collisions.
In the previous version of the PR, the max chain length was proportional to the table size, so that attacker would be able to make journald slow down significantly while doing O(n²) iteration on those chains.
In the latest version of the PR, max chain length is fixed to a constant, so the attacker can force journald to rotate files. This also creates a load, and in addition allows the attacker to flush log entries.
Note that rate limiting only helps a little bit here, because we allow many data fields in a single message. So in principle it should be fairly easy to generate a large number of collisions with just one or few journal entries.

It seems that the only real solution would be to not write the hash key at all until the file is closed. Any readers would have to read the whole file and figure out where objects are stored on their own. This also means that they key should not be based on the file id, but separate. Before the file is archived, the actual hash table could be written. To preserve operation on journald restarts, the "private" hash key and table could be stored in a sidecar file with no read permissions.

Good point.

I have another suggestion - just drop the check for hash chain length completely. Even with access to the file ID used to key the hash, an attacker will still have to redo the work of generating new colliding messages for every file rather than relying on a precomputed table. This consumes more resources than are used for inserting the colliding messages in the hash table.

@vuzdemav
Copy link
Contributor

Note that rate limiting only helps a little bit here, because we allow many data fields in a single message.

The field hash table is much smaller than the data hash table, which is desirable in the normal use case. Is there an attack vector here where a user logs many different fields to fill up the field hash table and trigger early rotation? (No need for any hash collision, just maxing out the hash table fill fraction.)

@poettering
Copy link
Member Author

I'm trying to wrap my head around how effective this mitigation is.
It works fine in the case of systemd-journal-remote. So for that alone, it seems worth merging.
OTOH, local mitigation does not seem to be effective.
A local user who has access to journal files can read the hash off the file and produce output that generates hash table collisions.
In the previous version of the PR, the max chain length was proportional to the table size, so that attacker would be able to make journald slow down significantly while doing O(n²) iteration on those chains.
In the latest version of the PR, max chain length is fixed to a constant, so the attacker can force journald to rotate files. This also creates a load, and in addition allows the attacker to flush log entries.
Note that rate limiting only helps a little bit here, because we allow many data fields in a single message. So in principle it should be fairly easy to generate a large number of collisions with just one or few journal entries.

So I had been thinking about this, and I think we can address this later too, but I think this should be independent of this PR. I mean, let me underline this: remote DoS is really important to fix, local DoS sucks and is something to fix, but there are so many others (think dbus) that I don't think it's as pressing

Specifically, what I'd propose in the long run is that we split the IO part of journald into per-UID processes. i.e. journald collects all incoming messages as before, augments them with metadata, does ratelimiting, yadda, yadda but then instead of calling journal_file_append() directly to write the stuff we got into a file we do so out-of-process from a per-client-UID process that we either fork off or that systemd forks off for us, gets the ready-made data and just writes it to disk. This process would drop privs entirely (possibly even setresuid() to the client's UID), and there would be one instance for each client UID that recently wrote stuff. This would mean metadata collection (the stuff we need privs for) and the IO stuff would be split. it would also mean that clients that manage to trigger hash table collisions will slow their own logging down, but nothing else. i.e. they can shoot themselves in the foot without affecting anything else.

Such a change would wouldn't even affect our global ordering guarantees, since the separations between journal files and between process would be the same here, and all ordering guarantees and absences would be maintained 1:1.

And it wouldn't be a complex patch I figure, writing the iovec[] that makes up the entry to a socket instead of journal_file_append() is easy...

It seems that the only real solution would be to not write the hash key at all until the file is closed. Any readers would have to read the whole file and figure out where objects are stored on their own. This also means that they key should not be based on the file id, but separate. Before the file is archived, the actual hash table could be written. To preserve operation on journald restarts, the "private" hash key and table could be stored in a sidecar file with no read permissions.

Well, but that just means all access goes from O(1) to O(n), so you basically make the effect of the DoS the default, who is helped by that?

I think as long as people can only fuck their own user up the ability to trigger hash collisions is not a problem.

But all that is independent of this patch. The remote DoS should definitely be dealt with independently, and any protection on top of that against slowing down/DoSing other users can then come on top of that.

Note that right now we actually have a pretty effective DoS protection in place already: per-user ratelimiting. Hence what I propose above isn't even necessary to even make local behaviour safe. It's still a wise thing to do so, because ratelimiting is an optional feature, and even with very high limits we should try to not fall over.

@vuzdemav
Copy link
Contributor

OTOH, local mitigation does not seem to be effective.

Another point here is that "local" log messages do not always come from local users. One example is the SSH server which logs failed login attempts with a message that can be partially controlled by an attacker (with sufficient control to give hash table collisions). However, the remote attacker has no way of reading the file ID..

The object flags field is a bitmask, hence don't sloppily define
_OBJECT_COMPRESSED_MAX as one mor than the previous flag. That worked OK
as long as we only had two flags, but will fall apart as soon as we have
three. Let's fix this.

(It's kinda sloppy how the string table is built here, as it will be
quite sparse as soon as we have more enum entries, but let's keep it for
now.)
We keep converting forth and back though we never need it in LE. Let's
stop doing those conversions hence.
Let's clean this up a bit, following our usual nomenclature to name
return parameters ret-xyz.

This is mostly a bit of renaming, but there's also some minor other
changes: if we return a pointer to a mmap'ed object plus its offset, in
almost all cases we are happy if either parameter is NULL in case the
caller is not interested in it. Let's fix the remaining case to do this
too, to minimize surprises.
Let's prefix this with "jenkins_" since it wraps the jenkins hash. We
want to add support for other hash functions to journald soon, hence
better be clear with what this is. In particular as all other symbols
defined by lookup3.h actually are prefixed "jenkins_".
This adds a new (incompatible) feature to journal files: if enabled the
hash function used for the hash tables is no longer jenkins hash with a
zero key, but siphash keyed by the file uuid that is included in the
file header anyway. This should make our hash tables more robust against
collision attacks, as long as the attacker has no read access to the
journal files. We switch from jenkins to siphash simply because it's
more well-known and we standardize for the rest of our codebase onto it.

This is hardening in order to make collision attacks harder for clients
that can forge log messages but have no read access to the logs. It has
no effect on clients that have read access.
Even with the new keyed hash table journal feature: if an attacker
manages to get access to the journal file id it could synthesize records
that result in hash collisions. Let's rotate automatically when we
notice that, so that a new journal file ID is generated, our performance
is restored and the attacker has to guess a new file ID before being
able to trigger the issue again.

That said, untrusted peers should never get access to journal files in
the first case...
Just an import, with no textual changes (some fixed URLs however)
let's simplify the checks for ZSTD/LZ4/XZ

As suggested:

systemd#16096 (comment)
@poettering
Copy link
Member Author

Force pushed a new version with the points @keszybz pointed out addressed, no other fixes...

(And no fix that adds per-UID journald writer processes, as dicussed, will work on that eventually though. Added to the TODO list in this PR, for now)

@keszybz
Copy link
Member

keszybz commented Jun 25, 2020

So I had been thinking about this, and I think we can address this later too, but I think this should be independent of this PR.
I mean, let me underline this: remote DoS is really important to fix, local DoS sucks and is something to fix

Agreed.

Well, but that just means all access goes from O(1) to O(n), so you basically make the effect of the DoS the default, who is helped by that?

journald stays O(1). Clients which follow the journal might become worse (but don't have to, if they do some hashing on their own side).

Note that right now we actually have a pretty effective DoS protection in place already: per-user ratelimiting. Hence what I propose above isn't even necessary to even make local behaviour safe. It's still a wise thing to do so, because ratelimiting is an optional feature, and even with very high limits we should try to not fall over.

Yep.

@keszybz
Copy link
Member

keszybz commented Jun 25, 2020

I made a quick re-review of the changed parts and everything lgtm.

@keszybz keszybz added good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed and removed good-to-merge/waiting-for-ci 👍 PR is good to merge, but CI hasn't passed at time of review. Please merge if you see CI has passed labels Jun 25, 2020
@keszybz keszybz merged commit 12d14b7 into systemd:master Jun 26, 2020
Yamakuzure pushed a commit to elogind/elogind that referenced this pull request Aug 31, 2020
let's simplify the checks for ZSTD/LZ4/XZ

As suggested:

systemd/systemd#16096 (comment)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

6 participants