-
Notifications
You must be signed in to change notification settings - Fork 27
Refactor BEP 30 implementation #301
Description
Context
In a "standard" torrent, you have a list of pieces from the original contents. And each piece has a hash. This way, peers can validate each piece after downloading them.
The BEP 30 proposed changing the pieces field in the info dictionary with a new root hash field containing the Merkle tree's root hash of a Merkle tree with all the content pieces.
Current Implementation
We use this struct for the Torrent info dictionary:
pub struct TorrentInfoDictionary {
pub name: String,
#[serde(default)]
pub pieces: Option<ByteBuf>,
#[serde(rename = "piece length")]
pub piece_length: i64,
#[serde(default)]
pub md5sum: Option<String>,
#[serde(default)]
pub length: Option<i64>,
#[serde(default)]
pub files: Option<Vec<TorrentFile>>,
#[serde(default)]
pub private: Option<u8>,
#[serde(default)]
pub path: Option<Vec<String>>,
#[serde(default)]
#[serde(rename = "root hash")]
pub root_hash: Option<String>, // <- the "root hash" defined in BEP 30
#[serde(default)]
pub source: Option<String>,
}In the database, the root hash value is stored in the table (for SQLite):
CREATE TABLE "torrust_torrents" (
"torrent_id" INTEGER NOT NULL,
"uploader_id" INTEGER NOT NULL,
"category_id" INTEGER,
"info_hash" TEXT NOT NULL UNIQUE,
"size" INTEGER NOT NULL,
"name" TEXT NOT NULL,
"pieces" TEXT NOT NULL,
"piece_length" INTEGER NOT NULL,
"private" BOOLEAN DEFAULT NULL,
"root_hash" INT NOT NULL DEFAULT 0, # <- a boolean (0 or 1)
"date_uploaded" TEXT NOT NULL,
"source" TEXT DEFAULT NULL,
"comment" TEXT,
FOREIGN KEY("uploader_id") REFERENCES "torrust_users"("user_id") ON DELETE CASCADE,
FOREIGN KEY("category_id") REFERENCES "torrust_categories"("category_id") ON DELETE SET NULL,
PRIMARY KEY("torrent_id" AUTOINCREMENT)
);And for MySQL:
...
root_hash BOOLEAN NOT NULL DEFAULT FALSE,
...But there are a couple of misleading things:
- Notice the
torrust_torrents::root_hashis a boolean field. It's true (1) if the torrent implements the BEP 30 and false otherwise. - The actual field to store the "root hash" is the torrust_torrents::pieces". That means the
piecescolumn can store both the standardpiecesvalue and theroot hashvalue for BEP-30 torrents.
That leads to some conditional blocks like these:
When you insert a new torrent into the database:
// torrent file can only hold a pieces key or a root hash key: http://www.bittorrent.org/beps/bep_0030.html
let (pieces, root_hash): (String, bool) = if let Some(pieces) = &torrent.info.pieces {
(from_bytes(pieces.as_ref()), false)
} else {
let root_hash = torrent.info.root_hash.as_ref().ok_or(database::Error::Error)?;
(root_hash.to_string(), true)
};When you hydrate the torrent from the database:
// a torrent file has a root hash or a pieces key, but not both.
if root_hash > 0 {
// If `root_hash` is true the `pieces` field contains the `root hash`
info_dict.root_hash = Some(pieces.to_owned());
} else {
let buffer = into_bytes(pieces).expect("variable `torrent_info.pieces` is not a valid hex string");
info_dict.pieces = Some(ByteBuf::from(buffer));
}Sometimes root_hash is a boolean, sometimes an integer (with two values 0 or 1), and sometimes a string with the hash in hex format.
Proposal
I do not know the reason why the pieces field is reused (@WarmBeer do you know it?), but I would:
- Rename the
root_hashcolumn tois_bep_30. - Create a new text field
root_hash.
BEP 30 introduction
BEP 30 (BitTorrent Enhancement Proposal 30) details the Merkle hash torrent extension. This is where the "root hash" concept comes into play in the BitTorrent protocol. The main idea behind this enhancement is to replace the flat list of piece hashes with a tree of hashes, specifically a Merkle tree. This allows for a more efficient and flexible way of verifying data.
Here's a simplified overview of BEP 30:
Merkle Trees
Merkle trees are binary trees of hashes. The leaves (bottom-most nodes) are hashes of the data blocks, and each non-leaf node is a hash of its two children. The root of this tree is particularly important, as it provides a hash that represents the integrity of all data blocks. If any piece of data changes, the root hash will change.
Advantages
- Dynamic Piece Size: The traditional flat list of hashes requires all pieces to be the same size. With Merkle trees, you can verify any continuous range of data regardless of size.
- Efficient Data Integrity: You don't need to download the entire list of piece hashes. If you're downloading a specific piece, you only need the hashes along the path from the piece to the root.
- Streaming: This structure is more conducive to streaming, as you can start verifying and using parts of the data without needing all the hashes first.
- Root Hash: The Merkle root hash, which represents the entire data set, replaces the traditional flat list of piece hashes in the torrent file. This hash is what's included in the torrent metadata.
Info Dictionary Changes
In the torrent file's "info" dictionary, the "pieces" field is replaced with the "root hash" field, containing the Merkle tree's root hash. The "piece length" field now specifies the size of leaf data blocks rather than traditional pieces.
Fetching The Tree
Since the torrent only contains the root hash, you'd need to fetch the rest of the tree from peers. When you request a piece of data, the peer also sends the required hashes from the Merkle tree to verify that data.
Backward Compatibility
BEP 30 was designed to be backward compatible. Torrents using Merkle trees can be shared with clients that don't support the extension, though there are some specifics to be handled to ensure this.
The move towards Merkle trees makes the BitTorrent protocol more adaptable and efficient, especially in scenarios where dynamic piece sizes or streaming capabilities are beneficial.
Questions
I do not know how to generate a torrent that implements BEP 30. I do not know any client supporting BEP 30. We could encode the torrent manually, but it would be better to user a trusted implementation for this issue.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status