Skip to content

Complements the documentation for pattern files and exclude files#5520

Merged
ThomasWaldmann merged 5 commits intoborgbackup:masterfrom
Gu1nness:#5236-exclude-from
Nov 29, 2020
Merged

Complements the documentation for pattern files and exclude files#5520
ThomasWaldmann merged 5 commits intoborgbackup:masterfrom
Gu1nness:#5236-exclude-from

Conversation

@Gu1nness
Copy link
Contributor

Fixes #5236

The path/filenames used as input for the pattern matching start from the
currently active recursion root. You usually give the recursion root(s)
when invoking borg and these can be either relative or absolute paths.

So, when you give relative/ as root, the paths going into the matcher
will look like relative/.../file.ext. When you give /absolute/ as
root, they will look like /absolute/.../file.ext.

File paths in Borg archives are always stored normalized and relative.
This means that e.g. borg create /path/to/repo ../some/path will
store all files as some/path/.../file.ext and borg create
/path/to/repo /home/user will store all files as
home/user/.../file.ext.

File patterns support these styles: fnmatch, shell, regular expressions,
path prefixes and path full-matches. By default, fnmatch is used for
--exclude patterns and shell-style is used for the experimental
--pattern option.

Starting with Borg 1.2, for all but regular expression pattern matching
styles, all paths are treated as relative, meaning that a leading path
separator is removed after normalizing and before matching. This allows
you to use absolute or relative patterns arbitrarily.

If followed by a colon (':') the first two characters of a pattern are
used as a style selector. Explicit style selection is necessary when a
non-default style is desired or when the desired pattern starts with
two alphanumeric characters followed by a colon (i.e. aa:something/*).

Fnmatch <https://docs.python.org/3/library/fnmatch.html>_, selector fm:
    This is the default style for --exclude and --exclude-from.
    These patterns use a variant of shell pattern syntax, with '*' matching
    any number of characters, '?' matching any single character, '[...]'
    matching any single character specified, including ranges, and '[!...]'
    matching any character not specified. For the purpose of these patterns,
    the path separator (backslash for Windows and '/' on other systems) is not
    treated specially. Wrap meta-characters in brackets for a literal
    match (i.e. [?] to match the literal character ?). For a path
    to match a pattern, the full path must match, or it must match
    from the start of the full path to just before a path separator. Except
    for the root path, paths will never end in the path separator when
    matching is attempted.  Thus, if a given pattern ends in a path
    separator, a '*' is appended before matching is attempted. A leading
    path separator is always removed.

Shell-style patterns, selector sh:
    This is the default style for --pattern and --patterns-from.
    Like fnmatch patterns these are similar to shell patterns. The difference
    is that the pattern may include **/ for matching zero or more directory
    levels, * for matching zero or more arbitrary characters with the
    exception of any path separator. A leading path separator is always removed.

Regular expressions, selector re:
    Regular expressions similar to those found in Perl are supported. Unlike
    shell patterns regular expressions are not required to match the full
    path and any substring match is sufficient. It is strongly recommended to
    anchor patterns to the start ('^'), to the end ('$') or both. Path
    separators (backslash for Windows and '/' on other systems) in paths are
    always normalized to a forward slash ('/') before applying a pattern. The
    regular expression syntax is described in the Python documentation for
    the re module <https://docs.python.org/3/library/re.html>_.

Path prefix, selector pp:
    This pattern style is useful to match whole sub-directories. The pattern
    pp:root/somedir matches root/somedir and everything therein. A leading
    path separator is always removed.

Path full-match, selector pf:
    This pattern style is (only) useful to match full paths.
    This is kind of a pseudo pattern as it can not have any variable or
    unspecified parts - the full path must be given. pf:root/file.ext matches
    root/file.ext only. A leading path separator is always removed.

    Implementation note: this is implemented via very time-efficient O(1)
    hashtable lookups (this means you can have huge amounts of such patterns
    without impacting performance much).
    Due to that, this kind of pattern does not respect any context or order.
    If you use such a pattern to include a file, it will always be included
    (if the directory recursion encounters it).
    Other include/exclude patterns that would normally match will be ignored.
    Same logic applies for exclude.

Note:

    re:, sh: and fm: patterns are all implemented on top of the Python SRE
    engine. It is very easy to formulate patterns for each of these types which
    requires an inordinate amount of time to match paths. If untrusted users
    are able to supply patterns, ensure they cannot supply re: patterns.
    Further, ensure that sh: and fm: patterns only contain a handful of
    wildcards at most.

Exclusions can be passed via the command line option --exclude. When used
from within a shell the patterns should be quoted to protect them from
expansion.

The --exclude-from option permits loading exclusion patterns from a text
file with one pattern per line. Lines empty or starting with the number sign
('#') after removing whitespace on both ends are ignored. The optional style
selector prefix is also supported for patterns loaded from a file. Due to
whitespace removal paths with whitespace at the beginning or end can only be
excluded using regular expressions.

To test your exclusion patterns without performing an actual backup you can
run borg create --list --dry-run ....

Examples:

    # Exclude '/home/user/file.o' but not '/home/user/file.odt':
    $ borg create -e '*.o' backup /

    # Exclude '/home/user/junk' and '/home/user/subdir/junk' but
    # not '/home/user/importantjunk' or '/etc/junk':
    $ borg create -e '/home/*/junk' backup /

    # Exclude the contents of '/home/user/cache' but not the directory itself:
    $ borg create -e home/user/cache/ backup /

    # The file '/home/user/cache/important' is *not* backed up:
    $ borg create -e /home/user/cache/ backup / /home/user/cache/important

    # The contents of directories in '/home' are not backed up when their name
    # ends in '.tmp'
    $ borg create --exclude 're:^/home/[^/]+\.tmp/' backup /

    # Load exclusions from file
    $ cat >exclude.txt <<EOF
    # Comment line
    /home/*/junk
    *.tmp
    fm:aa:something/*
    re:^home/[^/]\.tmp/
    sh:home/*/.thumbnails
    EOF
    $ borg create --exclude-from exclude.txt backup /

++ Experimental ++
    A more general and easier to use way to define filename matching patterns exists
    with the experimental --pattern and --patterns-from options. Using these, you
    may specify the backup roots (starting points) and patterns for inclusion/exclusion.
    A root path starts with the prefix R, followed by a path (a plain path, not a
    file pattern). An include rule starts with the prefix +, an exclude rule starts
    with the prefix -, an exclude-norecurse rule starts with !, all followed by a pattern.

    .. note:

        Via ``--pattern`` or ``--patterns-from`` you can define BOTH inclusion and exclusion
        of files using pattern prefixes ``+`` and ``-``. With ``--exclude`` and
        ``--exlude-from`` ONLY excludes are defined.

    Inclusion patterns are useful to include paths that are contained in an excluded
    path. The first matching pattern is used so if an include pattern matches before
    an exclude pattern, the file is backed up. If an exclude-norecurse pattern matches
    a directory, it won't recurse into it and won't discover any potential matches for
    include rules below that directory.

    Note that the default pattern style for ``--pattern`` and ``--patterns-from`` is
    shell style (`sh:`), so those patterns behave similar to rsync include/exclude
    patterns. The pattern style can be set via the `P` prefix.

    Patterns (``--pattern``) and excludes (``--exclude``) from the command line are
    considered first (in the order of appearance). Then patterns from ``--patterns-from``
    are added. Exclusion patterns from ``--exclude-from`` files are appended last.

    Examples::

        # backup pics, but not the ones from 2018, except the good ones:
        # note: using = is essential to avoid cmdline argument parsing issues.
        borg create --pattern=+pics/2018/good --pattern=-pics/2018 repo::arch pics

        # use a file with patterns:
        borg create --patterns-from patterns.lst repo::arch

    The patterns.lst file could look like that::

        # "sh:" pattern style is the default, so the following line is not needed:
        P sh
        R /
        # can be rebuild
        - /home/*/.cache
        # they're downloads for a reason
        - /home/*/Downloads
        # susan is a nice person
        # include susans home
        + /home/susan
        # don't backup the other home directories
        - /home/*
        # don't even look in /proc
        ! /proc command
Copy link
Member

@ThomasWaldmann ThomasWaldmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good if you (in general) looked at the overall diff on github, so you'll notice (and fix) accidental changes.

Comment on lines 2230 to 2232
EOF
# Example with spaces, no need to escape as it is processed by borg
some file with spaces.txt
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you need to put that above the EOF line so it becomes part of that file.

@Gu1nness
Copy link
Contributor Author

Sorry for all the accidental removals, I checked on my laptop but it did not appear to my eyes that these lines had disappeared…

@ThomasWaldmann ThomasWaldmann merged commit 2c8e523 into borgbackup:master Nov 29, 2020
@ThomasWaldmann
Copy link
Member

Thanks!

Gu1nness added a commit to Gu1nness/borg that referenced this pull request Nov 30, 2020
Gu1nness added a commit to Gu1nness/borg that referenced this pull request Nov 30, 2020
@ThomasWaldmann ThomasWaldmann mentioned this pull request Feb 28, 2021
ThomasWaldmann added a commit that referenced this pull request Jun 16, 2021
Complements the documentation for pattern files and exclude files (#5520)
@ghost ghost mentioned this pull request Aug 26, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Example path for --exclude-from

2 participants