Skip to content

Problem: Tendermint chain migration plan is incomplete.#68

Merged
kansi merged 14 commits intobigchaindb:masterfrom
ldmberman:bep-42-handle-tendermint-hard-fork
Sep 19, 2018
Merged

Problem: Tendermint chain migration plan is incomplete.#68
kansi merged 14 commits intobigchaindb:masterfrom
ldmberman:bep-42-handle-tendermint-hard-fork

Conversation

@ldmberman
Copy link
Copy Markdown
Contributor

Solution: Draft a migration plan. Address the problem of switching Tendermint chains. Describe how to replay the chain after a migration. Mention the implications on HTTP API.

@ldmberman ldmberman requested review from kansi and z-bowen August 14, 2018 15:00
Comment thread 42/README.md Outdated
Comment thread 42/README.md
votes_recieved=<Sum_of_votes_recieved>
votes_allocated=<Sum_of_votes_allocated_in_election>
network_size=<Total_network_power>
```
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is possible to definitely state that the migration concluded or not. We can just state if the migration has concluded or not.

Copy link
Copy Markdown
Contributor Author

@ldmberman ldmberman Aug 15, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Here I wanted it to look similar to how it looks for upsert-validator. We can add an additional election_concluded= line for both, what do you think?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes 👍

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking about conclusion (also related to this comment), we need to display election status, one of started, concluded, failed.

Let's agree on the names for these statuses and document them here and in BEP-21.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of started I would suggest ongoing? i.e. once an election transaction has been committed its status could be described as ongoing

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I will add an election_status= line but there is no need to enumerate the statuses here. You can just do it in BEP-21.

Comment thread 42/README.md

A CLI command can be used to instruct the node to stop building blocks at a chosen height.

To make sure no blocks are committed after the agreed height, we propose an election process:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the chosen height is reached before the election concludes, then I guess the election should be shut down or rendered obsolete somehow. The spec should say exactly what should happen.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Following up on that, when an election concludes before the proposed migration height h and there are no new transactions in the system such that blockchain can reach height h then what is supposed to happen?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points! I will document these cases.

Comment thread 42/README.md Outdated

#### 2. Replay the blockchain

In order to replay the chain up to the migration height, we need to either keep the old Tendermint version running or skip replaying the corresponding part of the chain = archive the chain. We consider the former to be a huge hassle so we conceive an archiving scheme further in this chapter.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The last sentence ends ambiguously with, "...so we conceive an archiving scheme further in this chapter." Conceive? Chapter?

Maybe, "...so we describe an archiving scheme elsewhere in this BEP."

Comment thread 42/README.md Outdated

Validators have to install and launch the new version of Tendermint. They need to prepare a new `genesis.json`. The new genesis file has to contain the validator set, the application hash (both at the migration height), and the identifier of the new Tendermint chain (`chain_id`).

The application hash is calculated by computing the Merkle tree root of the current UTXO.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not how the application hash is calculated today, but I guess that's an idea for how to calculate it in the future?

Maybe the details of how to compute the application hash should be left out of this BEP, because that's one of the things that can change over time. Vanshdeep started another BEP (in a pull request) to specify how to compute the application hash, in the future.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This way of computing hash was described here prior to this patch. We do need to know how to compute the hash to be able to migrate.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You definitely need to know how to compute the application hash to migrate, but the method of computing it can change from one chain to the next.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This BEP should advise on a reasonable way to hash to use for the implementation. I think, the proposed way is more reasonable than the existing one. I will note it down that it might change.

Comment thread 42/README.md Outdated

#### 4. Join the network after a migration

When a validator joins the network after a migration, it does not suffice for him to know `genesis.json` - he also needs the archive of the old chain (heights from 0 to the latest migration height).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There might be more than one migration, so the initial chain might go from height 0 to 33277, the second chain might go to height 88234, the third chain...

Comment thread 42/README.md Outdated

To make sure no blocks are committed after the agreed height, we propose an election process:

- The initiator creates an election, which includes a block height.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the purpose of adding specific block height to add a kind of timeout for the election process? So that if the election concludes before or at the given block height the network halts and prepares to migrate or else the election is rendered obsolete?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will update this part - there is no need to ask for a height there.

Comment thread 42/README.md Outdated
$ bigchaindb migration apply <dump>
```

After the dump is applied, Tendermint can be started.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happens if tendermint is started before the migration dump is applied? it will still sync with the current state of the new chain. I supposes tendermint can be started before or after the migration dump is applied but the bigchaindb server can only be started after the dump has been applied

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both need to be started after the dump is applied because without the dump you are not able to validate incoming transaction properly. I will try to make it explicit in the BEP.

Comment thread 42/README.md Outdated
Validators have to install and launch the new version of Tendermint. They need to prepare a new `genesis.json`. The new genesis file has to contain the validator set, the application hash (both at the migration height), and the identifier of the new Tendermint chain (`chain_id`).

The application hash is calculated by computing the Merkle tree root of the current UTXO.
The application hash is calculated by computing the Merkle tree root of the current UTXO. Note that hash calculation is a subject to design in a separate BEP and can be changed in the future.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know understand why you kept a specific method of computing the application hash in this BEP.

Moreover, we don't calculate the application hash like that today, and I'm not convinced we'll compute it that way in the future. There's a lot more to the current BigchainDB state than the UTXO set. The full state includes all asset.data and metadata information, for example.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used UTXO root since it can be used to do some useful verifications in the future, unlike the current hashing.

It's not indeed important at the moment because we will instruct the users to take the hash and the dump from the same validator - we do not anyway offer tools to verify the dump-hash integrity yet. So I can as well just recommend to use the already implemented way.

Comment thread 42/README.md Outdated
When a validator joins the network after a migration, it does not suffice for him to know `genesis.json` - he also needs the archive of the old chain.
There might be more than one migration, so the initial chain might go from height 0 to 33277, the second chain might go to height 88234, the third chain can go from 88234 up to the recent height. In this case the validator has to get an archive containing archived blocks from height 0 to height 88234.

Such a validator needs to take a snapshot of the old chain from an existing member of the network. The snapshot can be created via a CLI command:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the first iteration of this implementation, we can just recommend the user to use mongodump and mongorestore for achiving and loading data.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we need to be explicit about the importance of keeping/publishing the dump together with genesis.json.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Tendermint stores more information than MongoDB stores, such as all the block headers. Shouldn't that be "dumped" as well?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After a migration, Tendermint starts from scratch so no - there is nothing to dump.

Comment thread 42/README.md Outdated

After the snapshot is applied, Tendermint can be started.

Note that although the new node can join the network and work to some extent without applying the snapshot, it is not able to properly validate transactions so applying the snapshot is a must.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What effect will this have on the network? I suppose this node be treated as byzantine node?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will. Without sufficient data, it may claim invalid transactions to be valid.

@ttmc
Copy link
Copy Markdown
Contributor

ttmc commented Aug 16, 2018

I'm still a bit unclear about how the initial Tendermint state in a new chain gets set to the final state of the previous chain, but I suppose those details will get resolved once we start implementing (and we can revise this BEP to reflect what we do).

I will approve this PR.

@ldmberman
Copy link
Copy Markdown
Contributor Author

I'm still a bit unclear about how the initial Tendermint state in a new chain gets set to the final state of the previous chain,

The initial Tendermint state is described by genesis.json. chain_id, the app hash, and the validator set is everything that matters. Does it help?

Comment thread 42/README.md Outdated

#### Migration election specs

We introduce two transaction specs, with operation types [`TENDERMINT_MIGRATION_ELECTION`][spec_tendermint_migration_election] and [`TENDERMINT_MIGRATION_ELECTION_VOTE`][spec_tendermint_migration_election_vote], for the purpose of implementing migration elections. `TENDERMINT_MIGRATION_ELECTION` is an extension of `CREATE`; `TENDERMINT_MIGRATION_ELECTION_VOTE` is an extension of `TRANSFER`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wouldn't call those two things "specs." They're two new operation types, to be added to the existing BigchainDB Transactions Spec v2 (BEP-13).

@ttmc
Copy link
Copy Markdown
Contributor

ttmc commented Aug 17, 2018

The initial Tendermint state is described by genesis.json. chain_id, the app hash, and the validator set is everything that matters. Does it help?

Yes, thanks.

title: Tendermint Migration Election Schema - Updgrade Tendermint after a breaking change in the chain
required:
- operation
- asset
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The asset field is not defined?

Solution: Draft a migration plan. Address the problem of switching Tendermint chains. Describe how to replay the chain after a migration. Mention the implications on HTTP API.
Also, include election status into the command output doc.
Add some clarifications.
Rename snapshot -> archive. Use MongoDB tools. Document how to manage and validate archives.
@ldmberman ldmberman force-pushed the bep-42-handle-tendermint-hard-fork branch from 97a49d3 to c8f8170 Compare September 5, 2018 16:28
@kansi
Copy link
Copy Markdown
Contributor

kansi commented Sep 6, 2018

Why is there is no transaction spec for creating a migration proposal?

@ldmberman
Copy link
Copy Markdown
Contributor Author

Why is there is no transaction spec for creating a migration proposal?

It matches CREATE, except for the new operation.

@kansi
Copy link
Copy Markdown
Contributor

kansi commented Sep 6, 2018

But then what is the value of the asset field?

@ldmberman
Copy link
Copy Markdown
Contributor Author

But then what is the value of the asset field?

Follows the same constraints as in CREATE then.

Comment thread 42/README.md Outdated

BigchainDB operators need a convenient way to do the following:

1. Stop building blocks when the old Tendermint chain is at a jointly chosen height.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line makes it seem like the operators have to agree on a migration height (e.g. 4592) ahead of time, but later on, you make it clear that the migration height is just whatever the current height happens to be when the election is concluded. Maybe change this line to read:

  1. Stop building blocks for the old Tendermint chain at the same height as all the other operators.

Comment thread 42/README.md Outdated
```
votes_recieved=<Sum_of_votes_recieved>
votes_allocated=<Sum_of_votes_allocated_in_election>
network_size=<Total_network_power>
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line correct? Network size and total network power don't seem like the same thing to me.

Comment thread 42/README.md

Therefore, after the migration election is concluded, each validator has to create a chain archive and keep it together with `genesis.json`. If `genesis.json` is published somewhere, the archive should be published alongside.

To generate the archive, validators can use the `mongodump` command (comes together with `mongod`):
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We now know that the blockchain stored by MongoDB and the blockchain stored by Tendermint are different because Tendermint also stores invalid transactions. Does this matter? Don't the invalid transactions go into calculating some block header hashes on the Tendermint side?

In other words, shouldn't we be archiving Tendermint's LevelDB? Maybe just archive both MongoDB and LevelDB?

(We're considering replacing LevelDB with one MongoDB for everyone. If we do that, this will all become much easier.)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, we do not care about Tendermint storage here by design. After the migration, Tendermint starts from scratch while BigchainDB simply keeps the old data for querying but not for replay.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see, it makes sense now. Maybe you could write that in BEP-42 somewhere.

@kansi
Copy link
Copy Markdown
Contributor

kansi commented Sep 13, 2018

The transaction spec for chain migration should also be included in this PR

Edit: It seems that chain migration spec is exactly the same as the base election transaction spec. So

Comment thread 42/README.md Outdated
To offer a convenient way to get the data, we propose a CLI command:

```
$ bigchaindb status
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be updated to bigchaindb election status <election_id> and the expected output (apart from status=) should be defined.

Comment thread 42/README.md Outdated

#### Migration election specs

We introduce a new [transaction operation](./tendermint_migration_election.yaml), `TENDERMINT_MIGRATION_ELECTION`, for the purpose of implementing migration elections. `TENDERMINT_MIGRATION_ELECTION` follows the base election spec documented in [the TEP](../18).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

./tendermint_migration_election.yaml is not accessible

Comment thread 42/README.md Outdated

Note that new validators joining a permissioned network inherently have to trust the place they are getting data from - there is no generic way to assess the validity of `genesis.json` and the archive upon joining the network.

At the moment, there are no tools to verify the integrity between `genesis.json` and the archive. Such tools are subject to work on separately. In the future, one might need less trust in particular parties by downloading `genesison.json` and the archive from two different places and verifying the integrity. At the moment it is strongly recommended to download them from a single most trusted place.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

genesison.json -> genesis.json

Comment thread 42/README.md Outdated
The initiator executes:

```
$ bigchaindb election chain-migration new --private-key /home/user/.tendermint/config/priv_validator.json
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

according to the current description and implementation the command is bigchaindb election new chain-migration ...

Comment thread 42/README.md
@kansi kansi merged commit fb8f5f9 into bigchaindb:master Sep 19, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants