Skip to content

Conversation

@dscho
Copy link
Member

@dscho dscho commented Dec 25, 2019

As of Git for Windows, v2.24.1(2). cloning a repository that contained a file with a backslash in its name some time in the past, the command will succeed but at the same time print errors like this:

	error: filename in tree entry contains backslash: '\'

A corresponding git fetch will also show those errors, but fail.

The reason is that v2.24.1 is much more strict about backslashes in tree entries than earlier versions. The intention was actually to prevent checking out such files, though: if there was a mistake in a repository long ago that has been fixed long since, there is actually no reason why we should require the history to be rewritten.

This fixes git-for-windows#2435.

The idea of this patch is to stop complaining about tree entries, and focus instead on the index: whenever a file is added to the index, we do not want any backslashes in the file name on Windows.

As before, this check is only performed on Windows, and only under core.protectNTFS. On other platforms, even if core.protectNTFS is turned on, the backslash is not a directory separator, therefore the Operating System's syscalls will (should?) refuse to create files on NTFS with backslashes in their file name.

I would appreciate reviews with a particular eye on keeping users safe: I am not 100% certain that all relevant file writes go through the index (I think that they all go through the index, but I might have well missed a corner case).

Changes since v1:

  • Clarified the commit message (what is the goal, explain that the requirement is now loosened, explain why the code is still GIT_WINDOWS_NATIVE-only, etc).

@dscho
Copy link
Member Author

dscho commented Dec 26, 2019

/submit

@gitgitgadget-git
Copy link

Submitted as [email protected]

int pos;
int ok_to_add = option & ADD_CACHE_OK_TO_ADD;
int ok_to_replace = option & ADD_CACHE_OK_TO_REPLACE;
int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <[email protected]>
writes:

> From: Johannes Schindelin <[email protected]>
>
> During a clone of a repository that contained a file with a backslash in
> its name in the past, as of v2.24.1(2), Git for Windows prints errors
> like this:
>
> 	error: filename in tree entry contains backslash: '\'
>
> While the clone still succeeds, a similar error prevents the equivalent
> `git fetch` operation, which is inconsistent.

Yes, inconsistent is bad and it is puzzling.  I would have expected,
if this gate on the transport layer is desirable, such a check would
be implemented as a part of transfer.fsckObjects and covered equally
by fetch and clone codepaths.  What went wrong to allow "clone" to
go through while stopping "fetch"?  Can you describe the root cause
of the difference in the log message?

> Arguably, this is the wrong layer for that error, anyway: As long as the
> user never checks out the files whose names contain backslashes, there
> should not be any problem in the first place.

I do agree that rejecting these tree objects that has a slash in its
path component is probably wrong.  A GIT_WINDOWS_NATIVE box should
be able to host a bare repository on it, and users on machines that
are OK with paths that Windows may not like should be able to
interact with it, by pushing to it, fetching from it, and updating
the repository on that Windows box by going there and fetching from
elsewhere.  Rejecting these names at the object validity level means
Git on Windows would be incompatible with Git elsewhere.

And It hink the same logic apply to those names like prn, con, nul,
etc.  How are the users protected from them?  We should prevent
these names from entering the index the same way, shouldn't we?

> So let's instead prevent such files to be added to the index.

... and loosen the check that (incorrectly) gets triggered from what
codepaths in "git fetch" (but not from "git clone")?

> This addresses https://github.com/git-for-windows/git/issues/2435
>
> Signed-off-by: Johannes Schindelin <[email protected]>
> ---
>  read-cache.c               | 5 +++++
>  t/t7415-submodule-names.sh | 7 ++++---
>  tree-walk.c                | 6 ------
>  3 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/read-cache.c b/read-cache.c
> index ad0b48c84d..737916ebd9 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1278,6 +1278,11 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
>  	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
>  	int new_only = option & ADD_CACHE_NEW_ONLY;
>  
> +#ifdef GIT_WINDOWS_NATIVE
> +	if (protect_ntfs && strchr(ce->name, '\\'))

As I wondered above, names that must not enter the index may not be
limited to those with backslashes in them.  Perhaps you'd want a
separate helper function so that you can extend the logic more
easily, i.e.

	if (protect_ntfs && invalid_name_on_windows(ce->name))

or something like that.

> diff --git a/tree-walk.c b/tree-walk.c
> index b3d162051f..d5a8e096a6 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -43,12 +43,6 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
>  		strbuf_addstr(err, _("empty filename in tree entry"));
>  		return -1;
>  	}
> -#ifdef GIT_WINDOWS_NATIVE
> -	if (protect_ntfs && strchr(path, '\\')) {
> -		strbuf_addf(err, _("filename in tree entry contains backslash: '%s'"), path);
> -		return -1;
> -	}
> -#endif
>  	len = strlen(path) + 1;
>  
>  	/* Initialize the descriptor entry */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Thu, 26 Dec 2019, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <[email protected]>
> writes:
>
> > From: Johannes Schindelin <[email protected]>
> >
> > During a clone of a repository that contained a file with a backslash in
> > its name in the past, as of v2.24.1(2), Git for Windows prints errors
> > like this:
> >
> > 	error: filename in tree entry contains backslash: '\'
> >
> > While the clone still succeeds, a similar error prevents the equivalent
> > `git fetch` operation, which is inconsistent.
>
> Yes, inconsistent is bad and it is puzzling.  I would have expected,
> if this gate on the transport layer is desirable, such a check would
> be implemented as a part of transfer.fsckObjects and covered equally
> by fetch and clone codepaths.  What went wrong to allow "clone" to
> go through while stopping "fetch"?  Can you describe the root cause
> of the difference in the log message?

My bad, I should have root-caused this better.

Turns out that this inconsistency is only in Git for Windows v2.24.1(2)
but not in current `master` of Git, so I simply struck that part from the
commit message.

> > Arguably, this is the wrong layer for that error, anyway: As long as
> > the user never checks out the files whose names contain backslashes,
> > there should not be any problem in the first place.
>
> I do agree that rejecting these tree objects that has a slash in its
> path component is probably wrong.  A GIT_WINDOWS_NATIVE box should
> be able to host a bare repository on it, and users on machines that
> are OK with paths that Windows may not like should be able to
> interact with it, by pushing to it, fetching from it, and updating
> the repository on that Windows box by going there and fetching from
> elsewhere.  Rejecting these names at the object validity level means
> Git on Windows would be incompatible with Git elsewhere.
>
> And It hink the same logic apply to those names like prn, con, nul,
> etc.  How are the users protected from them?  We should prevent
> these names from entering the index the same way, shouldn't we?
>
> > So let's instead prevent such files to be added to the index.
>
> ... and loosen the check that (incorrectly) gets triggered from what
> codepaths in "git fetch" (but not from "git clone")?

I rephrased it to:

    So let's loosen the requirements: we now leave tree entries with
    backslashes in their file names alone, but we do require any entries
    that are added to the Git index to contain no backslashes on Windows.

> > This addresses https://github.com/git-for-windows/git/issues/2435
> >
> > Signed-off-by: Johannes Schindelin <[email protected]>
> > ---
> >  read-cache.c               | 5 +++++
> >  t/t7415-submodule-names.sh | 7 ++++---
> >  tree-walk.c                | 6 ------
> >  3 files changed, 9 insertions(+), 9 deletions(-)
> >
> > diff --git a/read-cache.c b/read-cache.c
> > index ad0b48c84d..737916ebd9 100644
> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -1278,6 +1278,11 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
> >  	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
> >  	int new_only = option & ADD_CACHE_NEW_ONLY;
> >
> > +#ifdef GIT_WINDOWS_NATIVE
> > +	if (protect_ntfs && strchr(ce->name, '\\'))
>
> As I wondered above, names that must not enter the index may not be
> limited to those with backslashes in them.  Perhaps you'd want a
> separate helper function so that you can extend the logic more
> easily, i.e.
>
> 	if (protect_ntfs && invalid_name_on_windows(ce->name))
>
> or something like that.

I decided to perform those checks at yet another layer: when trying to
create new files. My idea was that I would want to catch even things like
`git config -f LPT1 ...` (`LPT1` is a reserved name on Windows, you cannot
create a file with that name).

Obviously, I cannot handle the backslash in the same code path, as e.g.
`git config -f C:\Users\me\.gitconfig ...` is totally valid.

Ciao,
Dscho

> > diff --git a/tree-walk.c b/tree-walk.c
> > index b3d162051f..d5a8e096a6 100644
> > --- a/tree-walk.c
> > +++ b/tree-walk.c
> > @@ -43,12 +43,6 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
> >  		strbuf_addstr(err, _("empty filename in tree entry"));
> >  		return -1;
> >  	}
> > -#ifdef GIT_WINDOWS_NATIVE
> > -	if (protect_ntfs && strchr(path, '\\')) {
> > -		strbuf_addf(err, _("filename in tree entry contains backslash: '%s'"), path);
> > -		return -1;
> > -	}
> > -#endif
> >  	len = strlen(path) + 1;
> >
> >  	/* Initialize the descriptor entry */
>
>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Johannes Schindelin <[email protected]> writes:

> Turns out that this inconsistency is only in Git for Windows v2.24.1(2)
> but not in current `master` of Git, so I simply struck that part from the
> commit message.
> ...
> I rephrased it to:
>
>     So let's loosen the requirements: we now leave tree entries with
>     backslashes in their file names alone, but we do require any entries
>     that are added to the Git index to contain no backslashes on Windows.
> ...

We are in -rc so there is no real rush, but I take these to mean
that I should just leave this loose end hanging untied, and wait
for an updated version to replace it sometime early next year.

Thanks and happy new year.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Mon, 30 Dec 2019, Junio C Hamano wrote:

> Johannes Schindelin <[email protected]> writes:
>
> > Turns out that this inconsistency is only in Git for Windows v2.24.1(2)
> > but not in current `master` of Git, so I simply struck that part from the
> > commit message.
> > ...
> > I rephrased it to:
> >
> >     So let's loosen the requirements: we now leave tree entries with
> >     backslashes in their file names alone, but we do require any entries
> >     that are added to the Git index to contain no backslashes on Windows.
> > ...
>
> We are in -rc so there is no real rush,

My intention was actually to integrate this patch into Git for Windows
v2.25.0-rc1 already, that's why I sent it out for review even this late in
the -rc phase.

> but I take these to mean that I should just leave this loose end hanging
> untied, and wait for an updated version to replace it sometime early
> next year.

I sent out a new iteration of the patch *just* before the new year arrived
over here ;-)

> Thanks and happy new year.

The same to you!
Dscho

@gitgitgadget-git
Copy link

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Johannes Schindelin via GitGitGadget" <[email protected]>
writes:

> I would appreciate reviews with a particular eye on keeping users safe: I am
> not 100% certain that all relevant file writes go through the index (I think 
> that they all go through the index, but I might have well missed a corner
> case).

There are peripheral commands that do not use the index at all, such
as "archive"; piping "git archive" output to unarchiver that writes
into the filesystem would be a way.  But I do not think that
qualifies as an attack vector you are looking for.

int pos;
int ok_to_add = option & ADD_CACHE_OK_TO_ADD;
int ok_to_replace = option & ADD_CACHE_OK_TO_REPLACE;
int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jonathan Nieder wrote (reply to this):

Hi,

Johannes Schindelin wrote:

> During a clone of a repository that contained a file with a backslash in
> its name in the past, as of v2.24.1(2), Git for Windows prints errors
> like this:
>
> 	error: filename in tree entry contains backslash: '\'
>
> While the clone still succeeds, a similar error prevents the equivalent
> `git fetch` operation, which is inconsistent.
>
> Arguably, this is the wrong layer for that error, anyway: As long as the
> user never checks out the files whose names contain backslashes, there
> should not be any problem in the first place.

Hm.  The choice of right layer depends on what repositories in the wild
contain.  If there are none containing filenames with '\', then fsck et
al would be an appropriate layer for this.  With hindsight, it was not
a good idea to support this kind of filename.

However, between the lines of this commit messages I sense that there
*are* repositories in the wild using these kinds of filenames.

Can you say more about that?  What repositories are affected?  Do they
contain such filenames at HEAD or only in their history?  If someone
wants to check out a revision with such filenames, what should happen?

> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1278,6 +1278,11 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
>  	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
>  	int new_only = option & ADD_CACHE_NEW_ONLY;
>  
> +#ifdef GIT_WINDOWS_NATIVE
> +	if (protect_ntfs && strchr(ce->name, '\\'))
> +		return error(_("filename in tree entry contains backslash: '%s'"), ce->name);
> +#endif
> +

Why is this specific to the GIT_WINDOWS_NATIVE case?  Wouldn't it affect
ntfs usage on other platforms as well?

[...]
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -43,12 +43,6 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
>  		strbuf_addstr(err, _("empty filename in tree entry"));
>  		return -1;
>  	}
> -#ifdef GIT_WINDOWS_NATIVE
> -	if (protect_ntfs && strchr(path, '\\')) {
> -		strbuf_addf(err, _("filename in tree entry contains backslash: '%s'"), path);
> -		return -1;
> -	}
> -#endif

Ah, it's inherited from there, so orthogonal to this patch.

To summarize: I think the commit message and docs could use some work
to describe what invariants we're trying to maintain and what
real-world usage motivates them.

Thanks and hope that helps,
Jonathan

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Jonathan,

On Thu, 26 Dec 2019, Jonathan Nieder wrote:

> Johannes Schindelin wrote:
>
> > During a clone of a repository that contained a file with a backslash in
> > its name in the past, as of v2.24.1(2), Git for Windows prints errors
> > like this:
> >
> > 	error: filename in tree entry contains backslash: '\'
> >
> > While the clone still succeeds, a similar error prevents the equivalent
> > `git fetch` operation, which is inconsistent.
> >
> > Arguably, this is the wrong layer for that error, anyway: As long as the
> > user never checks out the files whose names contain backslashes, there
> > should not be any problem in the first place.
>
> Hm.  The choice of right layer depends on what repositories in the wild
> contain.  If there are none containing filenames with '\', then fsck et
> al would be an appropriate layer for this.  With hindsight, it was not
> a good idea to support this kind of filename.
>
> However, between the lines of this commit messages I sense that there
> *are* repositories in the wild using these kinds of filenames.
>
> Can you say more about that?  What repositories are affected?  Do they
> contain such filenames at HEAD or only in their history?  If someone
> wants to check out a revision with such filenames, what should happen?

Yes: https://github.com/zephyrproject-rtos/civetweb contains history where
at some stage, by mistake, a directory was called `\`. It has been fixed a
long time ago, but users obviously still want to be able to clone it ;-)

> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -1278,6 +1278,11 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
> >  	int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
> >  	int new_only = option & ADD_CACHE_NEW_ONLY;
> >
> > +#ifdef GIT_WINDOWS_NATIVE
> > +	if (protect_ntfs && strchr(ce->name, '\\'))
> > +		return error(_("filename in tree entry contains backslash: '%s'"), ce->name);
> > +#endif
> > +
>
> Why is this specific to the GIT_WINDOWS_NATIVE case?  Wouldn't it affect
> ntfs usage on other platforms as well?
>
> [...]
> > --- a/tree-walk.c
> > +++ b/tree-walk.c
> > @@ -43,12 +43,6 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
> >  		strbuf_addstr(err, _("empty filename in tree entry"));
> >  		return -1;
> >  	}
> > -#ifdef GIT_WINDOWS_NATIVE
> > -	if (protect_ntfs && strchr(path, '\\')) {
> > -		strbuf_addf(err, _("filename in tree entry contains backslash: '%s'"), path);
> > -		return -1;
> > -	}
> > -#endif
>
> Ah, it's inherited from there, so orthogonal to this patch.
>
> To summarize: I think the commit message and docs could use some work
> to describe what invariants we're trying to maintain and what
> real-world usage motivates them.

I added this paragraph to the commit message:

    Note: just as before, the check is guarded by `core.protectNTFS` (to
    allow overriding the check by toggling that config setting), and it
    is _only_ performed on Windows, as the backslash is not a directory
    separator elsewhere, even when writing to NTFS-formatted volumes.

Does that clarify the issue enough?
Dscho

>
> Thanks and hope that helps,
> Jonathan
>
>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jonathan Nieder wrote (reply to this):

Hi,

Johannes Schindelin wrote:
> On Thu, 26 Dec 2019, Jonathan Nieder wrote:
>> Johannes Schindelin wrote:

>>> Arguably, this is the wrong layer for that error, anyway: As long as the
>>> user never checks out the files whose names contain backslashes, there
>>> should not be any problem in the first place.
[...]
>>          between the lines of this commit messages I sense that there
>> *are* repositories in the wild using these kinds of filenames.
>>
>> Can you say more about that?  What repositories are affected?  Do they
>> contain such filenames at HEAD or only in their history?  If someone
>> wants to check out a revision with such filenames, what should happen?
>
> Yes: https://github.com/zephyrproject-rtos/civetweb contains history where
> at some stage, by mistake, a directory was called `\`. It has been fixed a
> long time ago, but users obviously still want to be able to clone it ;-)

Thanks.

What should and does happen if someone tries to check out an offending
revision?

[...]
> I added this paragraph to the commit message:
>
>     Note: just as before, the check is guarded by `core.protectNTFS` (to
>     allow overriding the check by toggling that config setting), and it
>     is _only_ performed on Windows, as the backslash is not a directory
>     separator elsewhere, even when writing to NTFS-formatted volumes.
>
> Does that clarify the issue enough?

It helps: that tells me why the check is only performed on Windows.

Since this only affects Windows, please only take this review as data
about how someone read the patch.  The patch doesn't make non Windows
specific code any *less* maintainable, and as a general presumption I
assume that the Git for Windows maintainer knows best about what is
needed for maintainability of Windows-specific code.

But the commit message and docs still don't describe what the desired
behavior is.  For example, should I be able to check out a revision
with a backslash in it (e.g. via Git skipping the offending entry), or
is the intended behavior for it to error out and prevent checking out
such versions of code?

Is there anything we can or should do to prevent people checking in
new examples of paths with backslash in them (on all platforms)?

Thanks,
Jonathan

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Jonathan Nieder <[email protected]> writes:

> Is there anything we can or should do to prevent people checking in
> new examples of paths with backslash in them (on all platforms)?

I obviously won't dictate what should happen on Windows, but I think
the overall principle for paths recorded in a tree object that can
be problematic on some of the platforms ought to be:

 * fsck and transfer.fsckobjects should be taught to notice
   offending characteristics (e.g. has a backslash in it, is one of
   the "reserved names" on some platform like LPT1).

 * if paths with the offending characteristics are *so* obviously
   useless in real life and are possible only in a crafted path that
   is only useful to attack users, the check in fsck should default
   to "reject" to help the disease spread via hosting sites.

 * otherwise, the check should be to "warn" but not "reject", so
   that projects can keep using paths that may problematic on
   platforms that do not matter to them.

I think LPT1 and friends fall into the "warning is fine" category,
and a path component that contains a backslash would fall into the
"this is an attack, just reject" category.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Junio C Hamano <[email protected]> writes:

> Jonathan Nieder <[email protected]> writes:
>
>> Is there anything we can or should do to prevent people checking in
>> new examples of paths with backslash in them (on all platforms)?
>
> I obviously won't dictate what should happen on Windows, but I think
> the overall principle for paths recorded in a tree object that can
> be problematic on some of the platforms ought to be:
>
>  * fsck and transfer.fsckobjects should be taught to notice
>    offending characteristics (e.g. has a backslash in it, is one of
>    the "reserved names" on some platform like LPT1).
>
>  * if paths with the offending characteristics are *so* obviously
>    useless in real life and are possible only in a crafted path that
>    is only useful to attack users, the check in fsck should default
>    to "reject" to help the disease spread via hosting sites.
>
>  * otherwise, the check should be to "warn" but not "reject", so
>    that projects can keep using paths that may problematic on
>    platforms that do not matter to them.
>
> I think LPT1 and friends fall into the "warning is fine" category,
> and a path component that contains a backslash would fall into the
> "this is an attack, just reject" category.

I guess I should have stepped back a bit.

In the message I am responding to, I focused solely on how tree
objects that originate elsewhere should be treated, but there are
two more data paths we need to worry about:

 * A new path gets introduced to the system, via "update-index",
   "add", etc., to the index and then "write-tree" to form tree
   objects in the local repository.

 * A path in the index, either created locally via "update-index",
   "add", etc., or read from a tree object, gets written to the
   local filesystem, resulting in an attempt to create a path that
   the local filesystem cannot represent (or worse---behaves badly,
   like "sending random garbage to the printer").

I think we should apply the same principle as the one I outlined for
the tree objects.  The fsckobjects mechanism may not be reusable to
catch violations in add_index_entry_with_check() as-is, but we need
to aim for as much reuse of logic and code as possible so that our
checks at various layers all behave consistently.

Thanks.



Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio & Jonathan,

On Thu, 26 Dec 2019, Junio C Hamano wrote:

> Junio C Hamano <[email protected]> writes:
>
> > Jonathan Nieder <[email protected]> writes:
> >
> >> Is there anything we can or should do to prevent people checking in
> >> new examples of paths with backslash in them (on all platforms)?
> >
> > I obviously won't dictate what should happen on Windows, but I think
> > the overall principle for paths recorded in a tree object that can
> > be problematic on some of the platforms ought to be:
> >
> >  * fsck and transfer.fsckobjects should be taught to notice
> >    offending characteristics (e.g. has a backslash in it, is one of
> >    the "reserved names" on some platform like LPT1).

Agree. This is on my radar, but so far not too-high priority, as the
`fsck` checks are not (yet?) standard practice in PR builds (and warnings
on the server side are prone to be ignored).

> >  * if paths with the offending characteristics are *so* obviously
> >    useless in real life and are possible only in a crafted path that
> >    is only useful to attack users, the check in fsck should default
> >    to "reject" to help the disease spread via hosting sites.

I don't think that reserved names such as `aux`, nor names containing
backslashes should be rejected _always_. While I cannot think of _any_
instance where I would want to have a backslash in a file name, I am sure
that just like `aux.c`, there _must_ be somebody out there who thought of
a file name that contains a backslash and makes at least some sort of
sense.

> >  * otherwise, the check should be to "warn" but not "reject", so
> >    that projects can keep using paths that may problematic on
> >    platforms that do not matter to them.

Yes, it should be "warn".

> > I think LPT1 and friends fall into the "warning is fine" category,
> > and a path component that contains a backslash would fall into the
> > "this is an attack, just reject" category.
>
> I guess I should have stepped back a bit.
>
> In the message I am responding to, I focused solely on how tree
> objects that originate elsewhere should be treated, but there are
> two more data paths we need to worry about:
>
>  * A new path gets introduced to the system, via "update-index",
>    "add", etc., to the index and then "write-tree" to form tree
>    objects in the local repository.

Right, that's what I had in mind when I wrote this patch. The path gets
added to the index, we detect a backslash, and on Windows (under
`core.protectNTFS`) fail with an error.

>  * A path in the index, either created locally via "update-index",
>    "add", etc., or read from a tree object, gets written to the
>    local filesystem, resulting in an attempt to create a path that
>    the local filesystem cannot represent (or worse---behaves badly,
>    like "sending random garbage to the printer").

Happily, my patch seems to catch this code path, too: when reading from a
`tree` object into the index, we use `add_index_entry()` (called via
various code paths in the `unpack_trees()` machinery). That's exactly the
patched function.

Or maybe you know of a code path in the `unpack_trees()` machinery that
does _not_ go through `add_index_entry()`? I would be very interested to
learn about such code paths.

> I think we should apply the same principle as the one I outlined for
> the tree objects.  The fsckobjects mechanism may not be reusable to
> catch violations in add_index_entry_with_check() as-is, but we need
> to aim for as much reuse of logic and code as possible so that our
> checks at various layers all behave consistently.

I am afraid that we won't be able to reuse code paths for checking the
backslash here, but for reserved names I am planning on refactoring the
code accordingly.

Thanks,
Dscho

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Jonathan,

On Thu, 26 Dec 2019, Jonathan Nieder wrote:

> Johannes Schindelin wrote:
> > On Thu, 26 Dec 2019, Jonathan Nieder wrote:
> >> Johannes Schindelin wrote:
>
> >>> Arguably, this is the wrong layer for that error, anyway: As long as the
> >>> user never checks out the files whose names contain backslashes, there
> >>> should not be any problem in the first place.
> [...]
> >>          between the lines of this commit messages I sense that there
> >> *are* repositories in the wild using these kinds of filenames.
> >>
> >> Can you say more about that?  What repositories are affected?  Do they
> >> contain such filenames at HEAD or only in their history?  If someone
> >> wants to check out a revision with such filenames, what should happen?
> >
> > Yes: https://github.com/zephyrproject-rtos/civetweb contains history where
> > at some stage, by mistake, a directory was called `\`. It has been fixed a
> > long time ago, but users obviously still want to be able to clone it ;-)
>
> Thanks.
>
> What should and does happen if someone tries to check out an offending
> revision?

To answer this question, I added this paragraph to the commit message:

    The idea is to prevent Git from even trying to write files with
    backslashes in their file names: while these characters are valid in
    file names on other platforms, on Windows it is interpreted as directory
    separator (which would obviously lead to ambiguities, e.g. when there is
    a file `a\b` and there is also a file `a/b`).

> [...]
> > I added this paragraph to the commit message:
> >
> >     Note: just as before, the check is guarded by `core.protectNTFS` (to
> >     allow overriding the check by toggling that config setting), and it
> >     is _only_ performed on Windows, as the backslash is not a directory
> >     separator elsewhere, even when writing to NTFS-formatted volumes.
> >
> > Does that clarify the issue enough?
>
> It helps: that tells me why the check is only performed on Windows.
>
> Since this only affects Windows, please only take this review as data
> about how someone read the patch.  The patch doesn't make non Windows
> specific code any *less* maintainable, and as a general presumption I
> assume that the Git for Windows maintainer knows best about what is
> needed for maintainability of Windows-specific code.
>
> But the commit message and docs still don't describe what the desired
> behavior is.  For example, should I be able to check out a revision
> with a backslash in it (e.g. via Git skipping the offending entry), or
> is the intended behavior for it to error out and prevent checking out
> such versions of code?

As mentioned above, the idea is to prevent Git from attempting to create
files with illegal file name characters.

> Is there anything we can or should do to prevent people checking in
> new examples of paths with backslash in them (on all platforms)?

As mentioned in my reply to Junio, I don't think that it would be wise to
even try to warn about backslashes in the file names. There are _so_ many
Git users out there, I am convinced that at least some of them have valid
use cases for file names with backslashes in them.

Ciao,
Dscho

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Jonathan Nieder wrote (reply to this):

Johannes Schindelin wrote:

> As mentioned above, the idea is to prevent Git from attempting to create
> files with illegal file name characters.
[...]
> On Thu, 26 Dec 2019, Jonathan Nieder wrote:

>> Is there anything we can or should do to prevent people checking in
>> new examples of paths with backslash in them (on all platforms)?
>
> As mentioned in my reply to Junio, I don't think that it would be wise to
> even try to warn about backslashes in the file names. There are _so_ many
> Git users out there, I am convinced that at least some of them have valid
> use cases for file names with backslashes in them.

Thanks for the quick answers.  It helps.

I think allowing people to clone
https://github.com/zephyrproject-rtos/civetweb but not to check out
the problematic historic revision is a reasonable choice, especially
since it's still possible to get the data from there using

	git checkout <rev> -- . ':!bad-paths'

[...]
> Or maybe you know of a code path in the `unpack_trees()` machinery that
> does _not_ go through `add_index_entry()`? I would be very interested to
> learn about such code paths.

Every once in a while someone (e.g., in #git) has wanted "git checkout
--skip-index <rev> -- <paths>", and that would be the natural way to
implement such a thing.  But no one has done it yet. :)

We'll just have to keep a watchful eye as people make new
contributions.

Sincerely,
Jonathan

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Jonathan,

On Fri, 3 Jan 2020, Jonathan Nieder wrote:

> Johannes Schindelin wrote:
>
> > As mentioned above, the idea is to prevent Git from attempting to create
> > files with illegal file name characters.
> [...]
> > On Thu, 26 Dec 2019, Jonathan Nieder wrote:
>
> >> Is there anything we can or should do to prevent people checking in
> >> new examples of paths with backslash in them (on all platforms)?
> >
> > As mentioned in my reply to Junio, I don't think that it would be wise to
> > even try to warn about backslashes in the file names. There are _so_ many
> > Git users out there, I am convinced that at least some of them have valid
> > use cases for file names with backslashes in them.
>
> Thanks for the quick answers.  It helps.
>
> I think allowing people to clone
> https://github.com/zephyrproject-rtos/civetweb but not to check out
> the problematic historic revision is a reasonable choice, especially
> since it's still possible to get the data from there using
>
> 	git checkout <rev> -- . ':!bad-paths'
>
> [...]
> > Or maybe you know of a code path in the `unpack_trees()` machinery that
> > does _not_ go through `add_index_entry()`? I would be very interested to
> > learn about such code paths.
>
> Every once in a while someone (e.g., in #git) has wanted "git checkout
> --skip-index <rev> -- <paths>", and that would be the natural way to
> implement such a thing.  But no one has done it yet. :)
>
> We'll just have to keep a watchful eye as people make new
> contributions.

Right.

As to my patch, in the meantime I wondered whether I should simply extend
`verify_path()` to take a `flags` parameter where I can specify whether
this is intended for checkout, and handle the backslashes _there_. That
would cover e.g. uses of `make_transient_cache_entry()`, too.

The only reason why I would want to introduce `flags` is that `git apply`
has this `--unsafe-paths` option where `verify_paths()` is called on the
file names in the diff, and I was unsure whether there are legitimate use
cases with diffs containing DOS-style paths (I suspect that there are
legitimate scenarios where this is desirable).

What do you think of that?

Ciao,
Dscho

@dscho dscho force-pushed the mingw-only-error-on-backslash-in-index branch 2 times, most recently from bdee6f5 to d7117ff Compare December 26, 2019 21:14
@gitgitgadget-git
Copy link

On the Git mailing list, Johannes Schindelin wrote (reply to this):

Hi Junio,

On Thu, 26 Dec 2019, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" <[email protected]>
> writes:
>
> > I would appreciate reviews with a particular eye on keeping users safe: I am
> > not 100% certain that all relevant file writes go through the index (I think
> > that they all go through the index, but I might have well missed a corner
> > case).
>
> There are peripheral commands that do not use the index at all, such
> as "archive"; piping "git archive" output to unarchiver that writes
> into the filesystem would be a way.  But I do not think that
> qualifies as an attack vector you are looking for.

Yes, I thought about `git archive`, too. The thing is, I could imagine
legitimate use cases where a user wants to generate e.g. a `.zip` on
Windows, with the intention to unpack it on Linux. In such a case, we
would not want to prevent said `.zip` from being generated.

I also briefly considered the scripts that write some trees into a
temporary location, but those scripts typically use `git read-tree -u -m`
with a temporary index, i.e. they _do_ go through the index, still.

Thank you for your thoughts about this issue,
Dscho

@gitgitgadget-git
Copy link

This patch series was integrated into pu via ef6ae42.

@gitgitgadget-git gitgitgadget-git bot added the pu label Dec 26, 2019
@gitgitgadget-git
Copy link

This branch is now known as js/mingw-loosen-overstrict-tree-entry-checks.

During a clone of a repository that contained a file with a backslash in
its name in the past, as of v2.24.1(2), Git for Windows prints errors
like this:

	error: filename in tree entry contains backslash: '\'

The idea is to prevent Git from even trying to write files with
backslashes in their file names: while these characters are valid in
file names on other platforms, on Windows it is interpreted as directory
separator (which would obviously lead to ambiguities, e.g. when there is
a file `a\b` and there is also a file `a/b`).

Arguably, this is the wrong layer for that error: As long as the user
never checks out the files whose names contain backslashes, there should
not be any problem in the first place.

So let's loosen the requirements: we now leave tree entries with
backslashes in their file names alone, but we do require any entries
that are added to the Git index to contain no backslashes on Windows.

Note: just as before, the check is guarded by `core.protectNTFS` (to
allow overriding the check by toggling that config setting), and it
is _only_ performed on Windows, as the backslash is not a directory
separator elsewhere, even when writing to NTFS-formatted volumes.

An alternative approach would be to try to prevent creating files with
backslashes in their file names. However, that comes with its own set of
problems. For example, `git config -f C:\ProgramData\Git\config ...` is
a very valid way to specify a custom config location, and we obviously
do _not_ want to prevent that. Therefore, the approach chosen in this
patch would appear to be better.

This addresses git-for-windows#2435

Signed-off-by: Johannes Schindelin <[email protected]>
@dscho dscho force-pushed the mingw-only-error-on-backslash-in-index branch from d7117ff to d6da831 Compare December 27, 2019 09:30
@gitgitgadget-git
Copy link

This patch series was integrated into pu via 7f12fa8.

@gitgitgadget-git
Copy link

This patch series was integrated into pu via 4b15b22.

@dscho
Copy link
Member Author

dscho commented Dec 31, 2019

/submit

@gitgitgadget-git
Copy link

Submitted as [email protected]

@gitgitgadget-git
Copy link

This patch series was integrated into pu via 3d9b5af.

@gitgitgadget-git
Copy link

This patch series was integrated into pu via 91db59b.

@gitgitgadget-git
Copy link

This patch series was integrated into next via 3088a0c.

@gitgitgadget-git
Copy link

This patch series was integrated into pu via 7a8deed.

@gitgitgadget-git
Copy link

This patch series was integrated into pu via a578ef9.

@gitgitgadget-git
Copy link

This patch series was integrated into next via a578ef9.

@gitgitgadget-git
Copy link

This patch series was integrated into master via a578ef9.

@gitgitgadget-git
Copy link

Closed via a578ef9.

@gitgitgadget-git gitgitgadget-git bot closed this Jan 6, 2020
@dscho dscho deleted the mingw-only-error-on-backslash-in-index branch January 7, 2020 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Git 2.24 breaks existing repositories: filename in tree entry contains backslash

1 participant