What is it?

This library takes kindle scribe notebook files (essentially KFX documents) and turns them into epub documents. Templates and pen-strokes are converted to SVG elements.

Why is the code so awful?

This is a rewrite of another tool that does something similar, but also handles non-notebook KFX documents. I was only interested in notebook parsing, so I only wrote what was necessary for me to figure out the structure of the original project, as well as replicate the behavior of the notebook execution path. Pretty much the moment I got it working, it accomplished everything I needed it to, so I didn't bother cleaning up the dead code, removing the commented out diagnostics, or making any sort of reasonable API. Maybe some day I'll get around to it, but seeing as I no longer daily-drive the Scribe, it's somewhat unlikely in the short term. I'm mostly making this public for posterity.

What about other KFX documents/Sticky Notes/Write-on-screen?

Nope, and there are no plans for it. This is purely a fast-path for notebook files. Other converters exist and those should be preferred if you have non-notebook files

What is performance like?

~4-7x faster than the most well known option

How accurate is the output?

Reasonably accurate but not perfect. The highlighter has minor defects on line-end, and the pencil has moderate defects since it's hard to replicate the exact scatter-pointing they use. The line width may also not be a 1:1 match.

How are page templates handled?

The logic necessary to implement them "properly" would dip into a ton of other areas that I don't want to touch. Since there are a finite number of templates, and they haven't been changed much (or at all) since the Scribe was released several years ago, I just include pre-converted SVG copies in the library itself that are inserted when necessary.

File size?

Comparable to any other kfx->SVG conversion. Mine sometimes end up a bit smaller because I don't include templates that aren't directly referenced by at least 1 page. From what I can tell, if you set a page's template, then switch to a different one, the original template remains part of the file even though it's not referenced by anything.

Compared to the raw KFX file, the epub usually ends up smaller by a good amount. On a 12MB test file, the resulting epub was ~7.6MB. For a 597KB KFX, the resulting epub was 61KB. This has more to do with how much dead space is in each KFX file rather than the storage format of the raw stroke data.

Why the `ion-rust` fork?

Amazon's official library has SymbolTable::add_symbol and SymbolTable::add_symbol_for_text as pub(crate). I change those to pub as I need them to reconstruct the ion data properly. I also took the libery of replacing the standard library HashMap with rustc_hash. They already had rustc_hash as a dependency, but they didn't use it everywhere for whatever reason. There might also be some minor alterations to make handling some lifetimes easier, but I can't remember.

`bytes_to_reg`?

In short, this macro ensures that comparisons between strings 8 bytes or less are done entirely in registers. Most of the ID's this library works with are 4-byte strings of the form $123, so this ensures the compiler doesn't fuck up trying to optimize it.

Here's an example from the notebook decoding:

pub fn fragment_name_symbol(ftype: &str) -> &'static str {
    match ftype {
        "$157" => "$173",
        "$164" => "$175",
        "$259" => "$176",
        "$260" => "$174",
        "$266" => "$180",
        "$391" => "$239",
        "$393" => "$240",
        "$608" => "$598",
        _ => "",
    }
}

LLVM is surprisingly conservative about this optimization, especially when considering Rust's memory safety guarantees and the very simple control flow. While it will do comparisons against integers (usually), and organize those comparisons into a binary search (usually), it won't load ftype into a register. It reads memory for every single comparison:

cmp rdx, 4
jne .LBB511_1
cmp dword ptr [rcx], 909521444
mov edx, 4
je .LBB511_3
cmp dword ptr [rcx], 875966756
je .LBB511_6
cmp dword ptr [rcx], 825832228
je .LBB511_8
cmp dword ptr [rcx], 859386660
je .LBB511_10
cmp dword ptr [rcx], 808858148
je .LBB511_12
cmp dword ptr [rcx], 942683684
je .LBB511_14
cmp dword ptr [rcx], 959787556
je .LBB511_16
xor edx, edx
cmp dword ptr [rcx], 926232868
...

I know that it'll still be in the L1 cache, but leaving the registers at all in this situation is silly.

With the macro, the string is read into a register immediately, and all comparisons are done via the register:

pub fn fragment_name_symbol(ftype: &str) -> &'static str {
    if ftype.len() != 4 {
        return "";
    }

    match b2r!(ftype) {
        b2r!("$157") => "$173",
        b2r!("$164") => "$175",
        b2r!("$259") => "$176",
        b2r!("$260") => "$174",
        b2r!("$266") => "$180",
        b2r!("$391") => "$239",
        b2r!("$393") => "$240",
        b2r!("$608") => "$598",
        _ => "",
    }
}

cmp rdx, 4
jne .LBB511_1
mov ecx, dword ptr [rcx]
bswap ecx
xor edx, edx
cmp ecx, 607270453
jg .LBB511_10
...

It's a microptimization for sure, but I've seen this optimized even worse. A few compiler versions ago (around 2023-2024) it sometimes wouldn't even treat the static strings as integers which was wild. It would do memory->memory comparisons for every single string.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
bytes_to_reg		bytes_to_reg
de_scribe		de_scribe
test_notebooks		test_notebooks
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What is it?

Why is the code so awful?

What about other KFX documents/Sticky Notes/Write-on-screen?

What is performance like?

How accurate is the output?

How are page templates handled?

File size?

Why the `ion-rust` fork?

`bytes_to_reg`?

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

What is it?

Why is the code so awful?

What about other KFX documents/Sticky Notes/Write-on-screen?

What is performance like?

How accurate is the output?

How are page templates handled?

File size?

Why the ion-rust fork?

bytes_to_reg?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Why the `ion-rust` fork?

`bytes_to_reg`?

Packages