Skip to content

Conversation

@t-bast
Copy link
Member

@t-bast t-bast commented Apr 16, 2025

Our previous architecture for channel key management was designed when we wanted the ability to use an external signer that would protect the private keys used by eclair: this meant we couldn't directly deal with private keys and forced us to have a more complex, indirect architecture which made it harder to understand how key derivation is done for commitment keys and forced us to partially derive some of the keys in many different places, which was very error-prone (when should we provide a payment_key instead of a delayed_payment_key ?).

Since then, we've spent years working on lightning-specific HSMs before VLS was even a thing, and ended up concluding that this was a dead end and it made much more sense to put the whole lightning node inside a secure enclave: https://acinq.co/blog/securing-a-100M-lightning-node

But we haven't refactored our key management code since then. Now that we're working on Taproot, which adds more complexity to transactions, it's a good time to re-work that part of the code and make it simpler. This refactoring took a while and required a few iterations to get right: I think the result is much better than what we previously had:

  • all of the key derivation is contained in ChannelKeys, which makes it easier to grasp
  • each channel has its own ChannelKeys instance, containing only what it needs
  • from a ChannelKeys instance and a per-commitment point, we can derive a set of CommitmentKeys that allow creating local/remote commitment and HTLC transactions
  • our force-close logic (in Helpers.scala) does not need to derive keys manually and can instead rely on a CommitmentKeys instance
  • script and witness creation also take a CommitmentKeys instance instead of specific public keys, which was extremely error-prone because we relied on naming instead of types to ensure that the right public key was provided

Importantly, this is only a first step, that will allow us to also greatly simplify our whole TransactionWithInputInfo architecture, which was also starting to show its age.

I really, really recommend reviewing this PR commit-by-commit: I spent a lot of time ensuring that each commit is a small logical step towards the end result, which makes it easier to review. Each commit message contains a description of what we're doing and why we're doing it: make sure you start by reading the commit message before reviewing the code!

t-bast added 9 commits April 16, 2025 12:45
We introduce dedicated classes to hold the keys used for either a local
commit transaction or a remote commit transaction. These keys directly
map to the keys described in Bolt 3. The goal is to generate those keys
only once instead of what we're currently doing, where we duplicate a
lot of code to partially generate keys in various places, which is very
error-prone: incorrectly using the wrong public or private key can lead
to loss of funds! It also removes the burden of inverting local/remote
in many places when we're spending a remote commitment: this only has
to happen once, in the `publicKeys` field of `RemoteCommitmentKeys`.

In this first commit, we only use this to generate commit txs, htlc txs
and claim-htlc txs. We generate these classes next to where the keys
were previously derived, to make it easier to review the diff. In the
next commit, we will create helper functions to generate those keys
based on the `Commitment` object.
Add helpers to create `LocalCommitmentKeys` and `RemoteCommitmentKeys`
from commitment parameters.
Add helpers to create `LocalCommitmentKeys` and `RemoteCommitmentKeys`
from a `Commitment` instance and start relying on this constructor.
We now create all commitment-related transactions using commitment keys
instead of explicitly using a subset of the public keys used.

Note how this highlights that many tests in `TransactionsSpec.scala`
were actually using incorrect keys, and thus not really testing the
correct behavior. This is a very good proof that making this simpler
is a good choice, because that kind of issues can also happen in non
test code (and has happened in the past already).
Instead of taking as argument individual keys (that can be incorrectly
provided), our functions to create witnesses and scripts now use the
commitment keys classes.
We were previously going through the `ChannelKeyManager` to sign txs
just to end up calling `TransactionWithInputInfo.sign` after applying
some key derivation. This was confusing and is now unnecessary: we can
directly call `TransactionWithInputInfo.sign` since the new commitment
keys directly contain the correctly tweaked keys.

The `ChannelKeyManager`'s responsibility is now simply to derive the
keys and ensure that different channels use different derivation paths.
Unfortunately, we still need to call it in a few different places to
obtain the channel key path and then derive some of our keys. This can
be improvied: we will go further in the following commits to simplify
this.
The `ChannelKeyManager` previously had two distinct roles:

- generating random funding key paths for new channels
- generating channel keys from a funding key path

This was confusing and forced all the code that wanted to derive channel
keys to always first get the channel key path from the key manager and
then call the function it was interested in with this channel key path
as argument.

We remove this confusion by having the `ChannelKeyManager` only generate
funding key paths for new channels and initialize `ChannelKeys` based on
a given funding key path. The new `ChannelKeys` object provides direct
access to derivation functions for all the keys necessary. Each channel
has its own `ChannelKeys` object that only lets it derive keys for its
own usage.

Reviewers should start by reviewing the changes to `ChannelKeyManager`
and `LocalChannelKeyManager`: the rest is mostly mechanical changes to
use this new architecture.

We also refactor a bit the functions in `Commitments.scala` to derive
commitment keys as early as possible and then provide them as arguments
to the child functions.
This commit trivially:

- moves `ChannelKeys` to its own file
- moves `CommitmentKeys` in the `keymanager` package

It doesn't contain any business logic.
We move the commitment key derivation logic inside `ChannelKeys`, which
makes it easier to understand the various types of derivation involved
in channel keys.

We also explicitly pattern match on `commitmentFormat` when deciding
whether the payment key needs to be tweaked for each commitment or not,
which makes it more future-proof. It is a bit confusing because we did
not introduce a `commitmentFormat` for `option_static_remotekey`, which
applies either to anchor outputs or the default commitment format when
the corresponding channel feature is turned on.
@t-bast t-bast requested review from pm47 and sstone April 16, 2025 13:51
@sstone
Copy link
Member

sstone commented Apr 22, 2025

That's a huge improvement! One of the benefits of "hiding" our private keys in a key manager is that they it would have been very hard to mistakenly save them to a database for example.

Our new security model assumes that secrets are "injected" into the eclair runtime,, and that everything outside the secure runtime, including our database, logs, ... is not safe and could be hacked and we still would not loose funds.

=> it's worth adding a comment to the new channel keys classes to warn developers that they should never be persisted either directly or through a class that uses them.

t-bast added 2 commits April 23, 2025 10:24
- Warn that channel keys must not be persisted or logged
- Restore xpub tests from previous version
We rename `xxxSecret -> xxxPoint` curve points that are only used when
tweaked and never appear directly as public keys inside scripts. This
better matches the BOLTs and helps distinguish keys used for signing.
Copy link
Member

@sstone sstone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is way better than what we had!

This is actually impossible, because we have an implicit link between
the commitment format and whether there is a static wallet point. We
unfortunately can't easily express that in types...
@t-bast t-bast merged commit ecd4634 into master Apr 24, 2025
1 check passed
@t-bast t-bast deleted the refactor-commitment-keys branch April 24, 2025 13:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants