This library takes kindle scribe notebook files (essentially KFX documents) and turns them into epub documents. Templates and pen-strokes are converted to SVG elements.
This is a rewrite of another tool that does something similar, but also handles non-notebook KFX documents. I was only interested in notebook parsing, so I only wrote what was necessary for me to figure out the structure of the original project, as well as replicate the behavior of the notebook execution path. Pretty much the moment I got it working, it accomplished everything I needed it to, so I didn't bother cleaning up the dead code, removing the commented out diagnostics, or making any sort of reasonable API. Maybe some day I'll get around to it, but seeing as I no longer daily-drive the Scribe, it's somewhat unlikely in the short term. I'm mostly making this public for posterity.
Nope, and there are no plans for it. This is purely a fast-path for notebook files. Other converters exist and those should be preferred if you have non-notebook files
~4-7x faster than the most well known option
Reasonably accurate but not perfect. The highlighter has minor defects on line-end, and the pencil has moderate defects since it's hard to replicate the exact scatter-pointing they use. The line width may also not be a 1:1 match.
The logic necessary to implement them "properly" would dip into a ton of other areas that I don't want to touch. Since there are a finite number of templates, and they haven't been changed much (or at all) since the Scribe was released several years ago, I just include pre-converted SVG copies in the library itself that are inserted when necessary.
Comparable to any other kfx->SVG conversion. Mine sometimes end up a bit smaller because I don't include templates that aren't directly referenced by at least 1 page. From what I can tell, if you set a page's template, then switch to a different one, the original template remains part of the file even though it's not referenced by anything.
Compared to the raw KFX file, the epub usually ends up smaller by a good amount. On a 12MB test file, the resulting epub was ~7.6MB. For a 597KB KFX, the resulting epub was 61KB. This has more to do with how much dead space is in each KFX file rather than the storage format of the raw stroke data.
Amazon's official library has SymbolTable::add_symbol and SymbolTable::add_symbol_for_text as pub(crate). I change those to pub as I need them to reconstruct the ion data properly. I also took the libery of replacing the standard library HashMap with rustc_hash. They already had rustc_hash as a dependency, but they didn't use it everywhere for whatever reason. There might also be some minor alterations to make handling some lifetimes easier, but I can't remember.
In short, this macro ensures that comparisons between strings 8 bytes or less are done entirely in registers. Most of the ID's this library works with are 4-byte strings of the form $123, so this ensures the compiler doesn't fuck up trying to optimize it.
Here's an example from the notebook decoding:
pub fn fragment_name_symbol(ftype: &str) -> &'static str {
match ftype {
"$157" => "$173",
"$164" => "$175",
"$259" => "$176",
"$260" => "$174",
"$266" => "$180",
"$391" => "$239",
"$393" => "$240",
"$608" => "$598",
_ => "",
}
}LLVM is surprisingly conservative about this optimization, especially when considering Rust's memory safety guarantees and the very simple control flow. While it will do comparisons against integers (usually), and organize those comparisons into a binary search (usually), it won't load ftype into a register. It reads memory for every single comparison:
cmp rdx, 4
jne .LBB511_1
cmp dword ptr [rcx], 909521444
mov edx, 4
je .LBB511_3
cmp dword ptr [rcx], 875966756
je .LBB511_6
cmp dword ptr [rcx], 825832228
je .LBB511_8
cmp dword ptr [rcx], 859386660
je .LBB511_10
cmp dword ptr [rcx], 808858148
je .LBB511_12
cmp dword ptr [rcx], 942683684
je .LBB511_14
cmp dword ptr [rcx], 959787556
je .LBB511_16
xor edx, edx
cmp dword ptr [rcx], 926232868
...I know that it'll still be in the L1 cache, but leaving the registers at all in this situation is silly.
With the macro, the string is read into a register immediately, and all comparisons are done via the register:
pub fn fragment_name_symbol(ftype: &str) -> &'static str {
if ftype.len() != 4 {
return "";
}
match b2r!(ftype) {
b2r!("$157") => "$173",
b2r!("$164") => "$175",
b2r!("$259") => "$176",
b2r!("$260") => "$174",
b2r!("$266") => "$180",
b2r!("$391") => "$239",
b2r!("$393") => "$240",
b2r!("$608") => "$598",
_ => "",
}
}cmp rdx, 4
jne .LBB511_1
mov ecx, dword ptr [rcx]
bswap ecx
xor edx, edx
cmp ecx, 607270453
jg .LBB511_10
...It's a microptimization for sure, but I've seen this optimized even worse. A few compiler versions ago (around 2023-2024) it sometimes wouldn't even treat the static strings as integers which was wild. It would do memory->memory comparisons for every single string.