clean

Strip superfluous metadata from notebooks

To avoid pointless conflicts while working with jupyter notebooks (with different execution counts or cell metadata), it is recommended to clean the notebooks before committing anything (done automatically if you install the git hooks with nbdev_install_hooks). The following functions are used to do that. Cleaning also adds cell ids if missing (required by nbformat 4.5+).

Trust

source

nbdev_trust


def nbdev_trust(
    fname:str=None, # A notebook name or glob to trust
    force_all:bool=False, # Also trust notebooks that haven't changed
):

Trust notebooks matching fname.

Clean

source

clean_nb


def clean_nb(
    nb, # The notebook to clean
    clear_all:bool=False, # Remove all cell metadata and cell outputs?
    allowed_metadata_keys:list=None, # Preserve the list of keys in the main notebook metadata
    allowed_cell_metadata_keys:list=None, # Preserve the list of keys in cell level metadata
    clean_ids:bool=True, # Remove ids from plaintext reprs?
):

Clean nb from superfluous metadata

Jupyter adds a trailing to images in cell outputs. Vscode-jupyter does not.
Notebooks should be brought to a common style to avoid unnecessary diffs:

test_nb = read_nb('../../tests/image.ipynb')
assert test_nb.cells[0].outputs[0].data['image/png'][-1] == "\n" # Make sure it was not converted by acccident
clean_nb(test_nb)
assert test_nb.cells[0].outputs[0].data['image/png'][-1] != "\n"

The test notebook has metadata in both the main metadata section and contains cell level metadata in the second cell:

test_nb = read_nb('../../tests/metadata.ipynb')

assert {'meta', 'jekyll', 'my_extra_key', 'my_removed_key'} <= test_nb.metadata.keys()
assert {'meta', 'hide_input', 'my_extra_cell_key', 'my_removed_cell_key'} == test_nb.cells[1].metadata.keys()

After cleaning the notebook, all extra metadata is removed, only some keys are allowed by default:

clean_nb(test_nb)

assert {'jekyll', 'kernelspec'} == test_nb.metadata.keys()
assert {'hide_input'} == test_nb.cells[1].metadata.keys()

We can preserve some additional keys at the notebook or cell levels:

test_nb = read_nb('../../tests/metadata.ipynb')
clean_nb(test_nb, allowed_metadata_keys={'my_extra_key'}, allowed_cell_metadata_keys={'my_extra_cell_key'})

assert {'jekyll', 'kernelspec', 'my_extra_key'} == test_nb.metadata.keys()
assert {'hide_input', 'my_extra_cell_key'} == test_nb.cells[1].metadata.keys()

Passing clear_all=True removes everything from the cell metadata:

test_nb = read_nb('../../tests/metadata.ipynb')
clean_nb(test_nb, clear_all=True)

assert {'jekyll', 'kernelspec'} == test_nb.metadata.keys()
test_eq(test_nb.cells[1].metadata, {})

Passing clean_ids=True removes ids from plaintext repr outputs, to avoid notebooks whose contents change on each run since they often lead to git merge conflicts. For example:

<PIL.PngImagePlugin.PngImageFile image mode=L size=28x28 at 0x7FB4F8979690>

becomes:

<PIL.PngImagePlugin.PngImageFile image mode=L size=28x28>

Cell IDs, on the other hand, are always added if missing

test_cell = {'source': 'x=1', 'cell_type': 'code', 'metadata': {}}
_clean_cell(test_cell, False, set(), True)
test_cell['id']

'183ecbfa'

source

process_write


def process_write(
    warn_msg, proc_nb, f_in, f_out:NoneType=None, disp:bool=False
):

source

nbdev_clean


def nbdev_clean(
    fname:str=None, # A notebook name or glob to clean
    clear_all:bool=False, # Remove all cell metadata and cell outputs?
    disp:bool=False, # Print the cleaned outputs
    stdin:bool=False, # Read notebook from input stream
):

Clean all notebooks in fname to avoid merge conflicts

By default (fname left to None), all the notebooks in config.nbs_path are cleaned. You can opt in to fully clean the notebook by removing every bit of metadata and the cell outputs by passing clear_all=True.

If you want to keep some keys in the main notebook metadata you can set allowed_metadata_keys in settings.ini. Similarly for cell level metadata use: allowed_cell_metadata_keys. For example, to preserve both k1 and k2 at both the notebook and cell level adding the following in settings.ini:

...
allowed_metadata_keys = k1 k2
allowed_cell_metadata_keys = k1 k2
...

source

clean_jupyter


def clean_jupyter(
    path, model, kwargs:VAR_KEYWORD
):

Clean Jupyter model pre save to path

This cleans notebooks on-save to avoid unnecessary merge conflicts. The easiest way to install it for both Jupyter Notebook and Lab is by running nbdev_install_hooks. It works by implementing a pre_save_hook from Jupyter’s file save hook API.

Hooks

source

nbdev_install_hooks


def nbdev_install_hooks(
    
):

Install Jupyter and git hooks to automatically clean, trust, and fix merge conflicts in notebooks

See clean_jupyter and nbdev_merge for more about how each hook works.