-
Notifications
You must be signed in to change notification settings - Fork 27.3k
Disallow writing, but not fetching commits with file names containing backslashes #682
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/submit |
|
Submitted as [email protected] |
| int pos; | ||
| int ok_to_add = option & ADD_CACHE_OK_TO_ADD; | ||
| int ok_to_replace = option & ADD_CACHE_OK_TO_REPLACE; | ||
| int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
"Johannes Schindelin via GitGitGadget" <[email protected]>
writes:
> From: Johannes Schindelin <[email protected]>
>
> During a clone of a repository that contained a file with a backslash in
> its name in the past, as of v2.24.1(2), Git for Windows prints errors
> like this:
>
> error: filename in tree entry contains backslash: '\'
>
> While the clone still succeeds, a similar error prevents the equivalent
> `git fetch` operation, which is inconsistent.
Yes, inconsistent is bad and it is puzzling. I would have expected,
if this gate on the transport layer is desirable, such a check would
be implemented as a part of transfer.fsckObjects and covered equally
by fetch and clone codepaths. What went wrong to allow "clone" to
go through while stopping "fetch"? Can you describe the root cause
of the difference in the log message?
> Arguably, this is the wrong layer for that error, anyway: As long as the
> user never checks out the files whose names contain backslashes, there
> should not be any problem in the first place.
I do agree that rejecting these tree objects that has a slash in its
path component is probably wrong. A GIT_WINDOWS_NATIVE box should
be able to host a bare repository on it, and users on machines that
are OK with paths that Windows may not like should be able to
interact with it, by pushing to it, fetching from it, and updating
the repository on that Windows box by going there and fetching from
elsewhere. Rejecting these names at the object validity level means
Git on Windows would be incompatible with Git elsewhere.
And It hink the same logic apply to those names like prn, con, nul,
etc. How are the users protected from them? We should prevent
these names from entering the index the same way, shouldn't we?
> So let's instead prevent such files to be added to the index.
... and loosen the check that (incorrectly) gets triggered from what
codepaths in "git fetch" (but not from "git clone")?
> This addresses https://github.com/git-for-windows/git/issues/2435
>
> Signed-off-by: Johannes Schindelin <[email protected]>
> ---
> read-cache.c | 5 +++++
> t/t7415-submodule-names.sh | 7 ++++---
> tree-walk.c | 6 ------
> 3 files changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/read-cache.c b/read-cache.c
> index ad0b48c84d..737916ebd9 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1278,6 +1278,11 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
> int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
> int new_only = option & ADD_CACHE_NEW_ONLY;
>
> +#ifdef GIT_WINDOWS_NATIVE
> + if (protect_ntfs && strchr(ce->name, '\\'))
As I wondered above, names that must not enter the index may not be
limited to those with backslashes in them. Perhaps you'd want a
separate helper function so that you can extend the logic more
easily, i.e.
if (protect_ntfs && invalid_name_on_windows(ce->name))
or something like that.
> diff --git a/tree-walk.c b/tree-walk.c
> index b3d162051f..d5a8e096a6 100644
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -43,12 +43,6 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
> strbuf_addstr(err, _("empty filename in tree entry"));
> return -1;
> }
> -#ifdef GIT_WINDOWS_NATIVE
> - if (protect_ntfs && strchr(path, '\\')) {
> - strbuf_addf(err, _("filename in tree entry contains backslash: '%s'"), path);
> - return -1;
> - }
> -#endif
> len = strlen(path) + 1;
>
> /* Initialize the descriptor entry */
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Junio,
On Thu, 26 Dec 2019, Junio C Hamano wrote:
> "Johannes Schindelin via GitGitGadget" <[email protected]>
> writes:
>
> > From: Johannes Schindelin <[email protected]>
> >
> > During a clone of a repository that contained a file with a backslash in
> > its name in the past, as of v2.24.1(2), Git for Windows prints errors
> > like this:
> >
> > error: filename in tree entry contains backslash: '\'
> >
> > While the clone still succeeds, a similar error prevents the equivalent
> > `git fetch` operation, which is inconsistent.
>
> Yes, inconsistent is bad and it is puzzling. I would have expected,
> if this gate on the transport layer is desirable, such a check would
> be implemented as a part of transfer.fsckObjects and covered equally
> by fetch and clone codepaths. What went wrong to allow "clone" to
> go through while stopping "fetch"? Can you describe the root cause
> of the difference in the log message?
My bad, I should have root-caused this better.
Turns out that this inconsistency is only in Git for Windows v2.24.1(2)
but not in current `master` of Git, so I simply struck that part from the
commit message.
> > Arguably, this is the wrong layer for that error, anyway: As long as
> > the user never checks out the files whose names contain backslashes,
> > there should not be any problem in the first place.
>
> I do agree that rejecting these tree objects that has a slash in its
> path component is probably wrong. A GIT_WINDOWS_NATIVE box should
> be able to host a bare repository on it, and users on machines that
> are OK with paths that Windows may not like should be able to
> interact with it, by pushing to it, fetching from it, and updating
> the repository on that Windows box by going there and fetching from
> elsewhere. Rejecting these names at the object validity level means
> Git on Windows would be incompatible with Git elsewhere.
>
> And It hink the same logic apply to those names like prn, con, nul,
> etc. How are the users protected from them? We should prevent
> these names from entering the index the same way, shouldn't we?
>
> > So let's instead prevent such files to be added to the index.
>
> ... and loosen the check that (incorrectly) gets triggered from what
> codepaths in "git fetch" (but not from "git clone")?
I rephrased it to:
So let's loosen the requirements: we now leave tree entries with
backslashes in their file names alone, but we do require any entries
that are added to the Git index to contain no backslashes on Windows.
> > This addresses https://github.com/git-for-windows/git/issues/2435
> >
> > Signed-off-by: Johannes Schindelin <[email protected]>
> > ---
> > read-cache.c | 5 +++++
> > t/t7415-submodule-names.sh | 7 ++++---
> > tree-walk.c | 6 ------
> > 3 files changed, 9 insertions(+), 9 deletions(-)
> >
> > diff --git a/read-cache.c b/read-cache.c
> > index ad0b48c84d..737916ebd9 100644
> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -1278,6 +1278,11 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
> > int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
> > int new_only = option & ADD_CACHE_NEW_ONLY;
> >
> > +#ifdef GIT_WINDOWS_NATIVE
> > + if (protect_ntfs && strchr(ce->name, '\\'))
>
> As I wondered above, names that must not enter the index may not be
> limited to those with backslashes in them. Perhaps you'd want a
> separate helper function so that you can extend the logic more
> easily, i.e.
>
> if (protect_ntfs && invalid_name_on_windows(ce->name))
>
> or something like that.
I decided to perform those checks at yet another layer: when trying to
create new files. My idea was that I would want to catch even things like
`git config -f LPT1 ...` (`LPT1` is a reserved name on Windows, you cannot
create a file with that name).
Obviously, I cannot handle the backslash in the same code path, as e.g.
`git config -f C:\Users\me\.gitconfig ...` is totally valid.
Ciao,
Dscho
> > diff --git a/tree-walk.c b/tree-walk.c
> > index b3d162051f..d5a8e096a6 100644
> > --- a/tree-walk.c
> > +++ b/tree-walk.c
> > @@ -43,12 +43,6 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
> > strbuf_addstr(err, _("empty filename in tree entry"));
> > return -1;
> > }
> > -#ifdef GIT_WINDOWS_NATIVE
> > - if (protect_ntfs && strchr(path, '\\')) {
> > - strbuf_addf(err, _("filename in tree entry contains backslash: '%s'"), path);
> > - return -1;
> > - }
> > -#endif
> > len = strlen(path) + 1;
> >
> > /* Initialize the descriptor entry */
>
>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Johannes Schindelin <[email protected]> writes:
> Turns out that this inconsistency is only in Git for Windows v2.24.1(2)
> but not in current `master` of Git, so I simply struck that part from the
> commit message.
> ...
> I rephrased it to:
>
> So let's loosen the requirements: we now leave tree entries with
> backslashes in their file names alone, but we do require any entries
> that are added to the Git index to contain no backslashes on Windows.
> ...
We are in -rc so there is no real rush, but I take these to mean
that I should just leave this loose end hanging untied, and wait
for an updated version to replace it sometime early next year.
Thanks and happy new year.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Junio,
On Mon, 30 Dec 2019, Junio C Hamano wrote:
> Johannes Schindelin <[email protected]> writes:
>
> > Turns out that this inconsistency is only in Git for Windows v2.24.1(2)
> > but not in current `master` of Git, so I simply struck that part from the
> > commit message.
> > ...
> > I rephrased it to:
> >
> > So let's loosen the requirements: we now leave tree entries with
> > backslashes in their file names alone, but we do require any entries
> > that are added to the Git index to contain no backslashes on Windows.
> > ...
>
> We are in -rc so there is no real rush,
My intention was actually to integrate this patch into Git for Windows
v2.25.0-rc1 already, that's why I sent it out for review even this late in
the -rc phase.
> but I take these to mean that I should just leave this loose end hanging
> untied, and wait for an updated version to replace it sometime early
> next year.
I sent out a new iteration of the patch *just* before the new year arrived
over here ;-)
> Thanks and happy new year.
The same to you!
Dscho
|
On the Git mailing list, Junio C Hamano wrote (reply to this): |
| int pos; | ||
| int ok_to_add = option & ADD_CACHE_OK_TO_ADD; | ||
| int ok_to_replace = option & ADD_CACHE_OK_TO_REPLACE; | ||
| int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jonathan Nieder wrote (reply to this):
Hi,
Johannes Schindelin wrote:
> During a clone of a repository that contained a file with a backslash in
> its name in the past, as of v2.24.1(2), Git for Windows prints errors
> like this:
>
> error: filename in tree entry contains backslash: '\'
>
> While the clone still succeeds, a similar error prevents the equivalent
> `git fetch` operation, which is inconsistent.
>
> Arguably, this is the wrong layer for that error, anyway: As long as the
> user never checks out the files whose names contain backslashes, there
> should not be any problem in the first place.
Hm. The choice of right layer depends on what repositories in the wild
contain. If there are none containing filenames with '\', then fsck et
al would be an appropriate layer for this. With hindsight, it was not
a good idea to support this kind of filename.
However, between the lines of this commit messages I sense that there
*are* repositories in the wild using these kinds of filenames.
Can you say more about that? What repositories are affected? Do they
contain such filenames at HEAD or only in their history? If someone
wants to check out a revision with such filenames, what should happen?
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1278,6 +1278,11 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
> int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
> int new_only = option & ADD_CACHE_NEW_ONLY;
>
> +#ifdef GIT_WINDOWS_NATIVE
> + if (protect_ntfs && strchr(ce->name, '\\'))
> + return error(_("filename in tree entry contains backslash: '%s'"), ce->name);
> +#endif
> +
Why is this specific to the GIT_WINDOWS_NATIVE case? Wouldn't it affect
ntfs usage on other platforms as well?
[...]
> --- a/tree-walk.c
> +++ b/tree-walk.c
> @@ -43,12 +43,6 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
> strbuf_addstr(err, _("empty filename in tree entry"));
> return -1;
> }
> -#ifdef GIT_WINDOWS_NATIVE
> - if (protect_ntfs && strchr(path, '\\')) {
> - strbuf_addf(err, _("filename in tree entry contains backslash: '%s'"), path);
> - return -1;
> - }
> -#endif
Ah, it's inherited from there, so orthogonal to this patch.
To summarize: I think the commit message and docs could use some work
to describe what invariants we're trying to maintain and what
real-world usage motivates them.
Thanks and hope that helps,
Jonathan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Jonathan,
On Thu, 26 Dec 2019, Jonathan Nieder wrote:
> Johannes Schindelin wrote:
>
> > During a clone of a repository that contained a file with a backslash in
> > its name in the past, as of v2.24.1(2), Git for Windows prints errors
> > like this:
> >
> > error: filename in tree entry contains backslash: '\'
> >
> > While the clone still succeeds, a similar error prevents the equivalent
> > `git fetch` operation, which is inconsistent.
> >
> > Arguably, this is the wrong layer for that error, anyway: As long as the
> > user never checks out the files whose names contain backslashes, there
> > should not be any problem in the first place.
>
> Hm. The choice of right layer depends on what repositories in the wild
> contain. If there are none containing filenames with '\', then fsck et
> al would be an appropriate layer for this. With hindsight, it was not
> a good idea to support this kind of filename.
>
> However, between the lines of this commit messages I sense that there
> *are* repositories in the wild using these kinds of filenames.
>
> Can you say more about that? What repositories are affected? Do they
> contain such filenames at HEAD or only in their history? If someone
> wants to check out a revision with such filenames, what should happen?
Yes: https://github.com/zephyrproject-rtos/civetweb contains history where
at some stage, by mistake, a directory was called `\`. It has been fixed a
long time ago, but users obviously still want to be able to clone it ;-)
> > --- a/read-cache.c
> > +++ b/read-cache.c
> > @@ -1278,6 +1278,11 @@ static int add_index_entry_with_check(struct index_state *istate, struct cache_e
> > int skip_df_check = option & ADD_CACHE_SKIP_DFCHECK;
> > int new_only = option & ADD_CACHE_NEW_ONLY;
> >
> > +#ifdef GIT_WINDOWS_NATIVE
> > + if (protect_ntfs && strchr(ce->name, '\\'))
> > + return error(_("filename in tree entry contains backslash: '%s'"), ce->name);
> > +#endif
> > +
>
> Why is this specific to the GIT_WINDOWS_NATIVE case? Wouldn't it affect
> ntfs usage on other platforms as well?
>
> [...]
> > --- a/tree-walk.c
> > +++ b/tree-walk.c
> > @@ -43,12 +43,6 @@ static int decode_tree_entry(struct tree_desc *desc, const char *buf, unsigned l
> > strbuf_addstr(err, _("empty filename in tree entry"));
> > return -1;
> > }
> > -#ifdef GIT_WINDOWS_NATIVE
> > - if (protect_ntfs && strchr(path, '\\')) {
> > - strbuf_addf(err, _("filename in tree entry contains backslash: '%s'"), path);
> > - return -1;
> > - }
> > -#endif
>
> Ah, it's inherited from there, so orthogonal to this patch.
>
> To summarize: I think the commit message and docs could use some work
> to describe what invariants we're trying to maintain and what
> real-world usage motivates them.
I added this paragraph to the commit message:
Note: just as before, the check is guarded by `core.protectNTFS` (to
allow overriding the check by toggling that config setting), and it
is _only_ performed on Windows, as the backslash is not a directory
separator elsewhere, even when writing to NTFS-formatted volumes.
Does that clarify the issue enough?
Dscho
>
> Thanks and hope that helps,
> Jonathan
>
>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jonathan Nieder wrote (reply to this):
Hi,
Johannes Schindelin wrote:
> On Thu, 26 Dec 2019, Jonathan Nieder wrote:
>> Johannes Schindelin wrote:
>>> Arguably, this is the wrong layer for that error, anyway: As long as the
>>> user never checks out the files whose names contain backslashes, there
>>> should not be any problem in the first place.
[...]
>> between the lines of this commit messages I sense that there
>> *are* repositories in the wild using these kinds of filenames.
>>
>> Can you say more about that? What repositories are affected? Do they
>> contain such filenames at HEAD or only in their history? If someone
>> wants to check out a revision with such filenames, what should happen?
>
> Yes: https://github.com/zephyrproject-rtos/civetweb contains history where
> at some stage, by mistake, a directory was called `\`. It has been fixed a
> long time ago, but users obviously still want to be able to clone it ;-)
Thanks.
What should and does happen if someone tries to check out an offending
revision?
[...]
> I added this paragraph to the commit message:
>
> Note: just as before, the check is guarded by `core.protectNTFS` (to
> allow overriding the check by toggling that config setting), and it
> is _only_ performed on Windows, as the backslash is not a directory
> separator elsewhere, even when writing to NTFS-formatted volumes.
>
> Does that clarify the issue enough?
It helps: that tells me why the check is only performed on Windows.
Since this only affects Windows, please only take this review as data
about how someone read the patch. The patch doesn't make non Windows
specific code any *less* maintainable, and as a general presumption I
assume that the Git for Windows maintainer knows best about what is
needed for maintainability of Windows-specific code.
But the commit message and docs still don't describe what the desired
behavior is. For example, should I be able to check out a revision
with a backslash in it (e.g. via Git skipping the offending entry), or
is the intended behavior for it to error out and prevent checking out
such versions of code?
Is there anything we can or should do to prevent people checking in
new examples of paths with backslash in them (on all platforms)?
Thanks,
Jonathan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Jonathan Nieder <[email protected]> writes:
> Is there anything we can or should do to prevent people checking in
> new examples of paths with backslash in them (on all platforms)?
I obviously won't dictate what should happen on Windows, but I think
the overall principle for paths recorded in a tree object that can
be problematic on some of the platforms ought to be:
* fsck and transfer.fsckobjects should be taught to notice
offending characteristics (e.g. has a backslash in it, is one of
the "reserved names" on some platform like LPT1).
* if paths with the offending characteristics are *so* obviously
useless in real life and are possible only in a crafted path that
is only useful to attack users, the check in fsck should default
to "reject" to help the disease spread via hosting sites.
* otherwise, the check should be to "warn" but not "reject", so
that projects can keep using paths that may problematic on
platforms that do not matter to them.
I think LPT1 and friends fall into the "warning is fine" category,
and a path component that contains a backslash would fall into the
"this is an attack, just reject" category.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Junio C Hamano wrote (reply to this):
Junio C Hamano <[email protected]> writes:
> Jonathan Nieder <[email protected]> writes:
>
>> Is there anything we can or should do to prevent people checking in
>> new examples of paths with backslash in them (on all platforms)?
>
> I obviously won't dictate what should happen on Windows, but I think
> the overall principle for paths recorded in a tree object that can
> be problematic on some of the platforms ought to be:
>
> * fsck and transfer.fsckobjects should be taught to notice
> offending characteristics (e.g. has a backslash in it, is one of
> the "reserved names" on some platform like LPT1).
>
> * if paths with the offending characteristics are *so* obviously
> useless in real life and are possible only in a crafted path that
> is only useful to attack users, the check in fsck should default
> to "reject" to help the disease spread via hosting sites.
>
> * otherwise, the check should be to "warn" but not "reject", so
> that projects can keep using paths that may problematic on
> platforms that do not matter to them.
>
> I think LPT1 and friends fall into the "warning is fine" category,
> and a path component that contains a backslash would fall into the
> "this is an attack, just reject" category.
I guess I should have stepped back a bit.
In the message I am responding to, I focused solely on how tree
objects that originate elsewhere should be treated, but there are
two more data paths we need to worry about:
* A new path gets introduced to the system, via "update-index",
"add", etc., to the index and then "write-tree" to form tree
objects in the local repository.
* A path in the index, either created locally via "update-index",
"add", etc., or read from a tree object, gets written to the
local filesystem, resulting in an attempt to create a path that
the local filesystem cannot represent (or worse---behaves badly,
like "sending random garbage to the printer").
I think we should apply the same principle as the one I outlined for
the tree objects. The fsckobjects mechanism may not be reusable to
catch violations in add_index_entry_with_check() as-is, but we need
to aim for as much reuse of logic and code as possible so that our
checks at various layers all behave consistently.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Junio & Jonathan,
On Thu, 26 Dec 2019, Junio C Hamano wrote:
> Junio C Hamano <[email protected]> writes:
>
> > Jonathan Nieder <[email protected]> writes:
> >
> >> Is there anything we can or should do to prevent people checking in
> >> new examples of paths with backslash in them (on all platforms)?
> >
> > I obviously won't dictate what should happen on Windows, but I think
> > the overall principle for paths recorded in a tree object that can
> > be problematic on some of the platforms ought to be:
> >
> > * fsck and transfer.fsckobjects should be taught to notice
> > offending characteristics (e.g. has a backslash in it, is one of
> > the "reserved names" on some platform like LPT1).
Agree. This is on my radar, but so far not too-high priority, as the
`fsck` checks are not (yet?) standard practice in PR builds (and warnings
on the server side are prone to be ignored).
> > * if paths with the offending characteristics are *so* obviously
> > useless in real life and are possible only in a crafted path that
> > is only useful to attack users, the check in fsck should default
> > to "reject" to help the disease spread via hosting sites.
I don't think that reserved names such as `aux`, nor names containing
backslashes should be rejected _always_. While I cannot think of _any_
instance where I would want to have a backslash in a file name, I am sure
that just like `aux.c`, there _must_ be somebody out there who thought of
a file name that contains a backslash and makes at least some sort of
sense.
> > * otherwise, the check should be to "warn" but not "reject", so
> > that projects can keep using paths that may problematic on
> > platforms that do not matter to them.
Yes, it should be "warn".
> > I think LPT1 and friends fall into the "warning is fine" category,
> > and a path component that contains a backslash would fall into the
> > "this is an attack, just reject" category.
>
> I guess I should have stepped back a bit.
>
> In the message I am responding to, I focused solely on how tree
> objects that originate elsewhere should be treated, but there are
> two more data paths we need to worry about:
>
> * A new path gets introduced to the system, via "update-index",
> "add", etc., to the index and then "write-tree" to form tree
> objects in the local repository.
Right, that's what I had in mind when I wrote this patch. The path gets
added to the index, we detect a backslash, and on Windows (under
`core.protectNTFS`) fail with an error.
> * A path in the index, either created locally via "update-index",
> "add", etc., or read from a tree object, gets written to the
> local filesystem, resulting in an attempt to create a path that
> the local filesystem cannot represent (or worse---behaves badly,
> like "sending random garbage to the printer").
Happily, my patch seems to catch this code path, too: when reading from a
`tree` object into the index, we use `add_index_entry()` (called via
various code paths in the `unpack_trees()` machinery). That's exactly the
patched function.
Or maybe you know of a code path in the `unpack_trees()` machinery that
does _not_ go through `add_index_entry()`? I would be very interested to
learn about such code paths.
> I think we should apply the same principle as the one I outlined for
> the tree objects. The fsckobjects mechanism may not be reusable to
> catch violations in add_index_entry_with_check() as-is, but we need
> to aim for as much reuse of logic and code as possible so that our
> checks at various layers all behave consistently.
I am afraid that we won't be able to reuse code paths for checking the
backslash here, but for reserved names I am planning on refactoring the
code accordingly.
Thanks,
Dscho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Jonathan,
On Thu, 26 Dec 2019, Jonathan Nieder wrote:
> Johannes Schindelin wrote:
> > On Thu, 26 Dec 2019, Jonathan Nieder wrote:
> >> Johannes Schindelin wrote:
>
> >>> Arguably, this is the wrong layer for that error, anyway: As long as the
> >>> user never checks out the files whose names contain backslashes, there
> >>> should not be any problem in the first place.
> [...]
> >> between the lines of this commit messages I sense that there
> >> *are* repositories in the wild using these kinds of filenames.
> >>
> >> Can you say more about that? What repositories are affected? Do they
> >> contain such filenames at HEAD or only in their history? If someone
> >> wants to check out a revision with such filenames, what should happen?
> >
> > Yes: https://github.com/zephyrproject-rtos/civetweb contains history where
> > at some stage, by mistake, a directory was called `\`. It has been fixed a
> > long time ago, but users obviously still want to be able to clone it ;-)
>
> Thanks.
>
> What should and does happen if someone tries to check out an offending
> revision?
To answer this question, I added this paragraph to the commit message:
The idea is to prevent Git from even trying to write files with
backslashes in their file names: while these characters are valid in
file names on other platforms, on Windows it is interpreted as directory
separator (which would obviously lead to ambiguities, e.g. when there is
a file `a\b` and there is also a file `a/b`).
> [...]
> > I added this paragraph to the commit message:
> >
> > Note: just as before, the check is guarded by `core.protectNTFS` (to
> > allow overriding the check by toggling that config setting), and it
> > is _only_ performed on Windows, as the backslash is not a directory
> > separator elsewhere, even when writing to NTFS-formatted volumes.
> >
> > Does that clarify the issue enough?
>
> It helps: that tells me why the check is only performed on Windows.
>
> Since this only affects Windows, please only take this review as data
> about how someone read the patch. The patch doesn't make non Windows
> specific code any *less* maintainable, and as a general presumption I
> assume that the Git for Windows maintainer knows best about what is
> needed for maintainability of Windows-specific code.
>
> But the commit message and docs still don't describe what the desired
> behavior is. For example, should I be able to check out a revision
> with a backslash in it (e.g. via Git skipping the offending entry), or
> is the intended behavior for it to error out and prevent checking out
> such versions of code?
As mentioned above, the idea is to prevent Git from attempting to create
files with illegal file name characters.
> Is there anything we can or should do to prevent people checking in
> new examples of paths with backslash in them (on all platforms)?
As mentioned in my reply to Junio, I don't think that it would be wise to
even try to warn about backslashes in the file names. There are _so_ many
Git users out there, I am convinced that at least some of them have valid
use cases for file names with backslashes in them.
Ciao,
Dscho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Jonathan Nieder wrote (reply to this):
Johannes Schindelin wrote:
> As mentioned above, the idea is to prevent Git from attempting to create
> files with illegal file name characters.
[...]
> On Thu, 26 Dec 2019, Jonathan Nieder wrote:
>> Is there anything we can or should do to prevent people checking in
>> new examples of paths with backslash in them (on all platforms)?
>
> As mentioned in my reply to Junio, I don't think that it would be wise to
> even try to warn about backslashes in the file names. There are _so_ many
> Git users out there, I am convinced that at least some of them have valid
> use cases for file names with backslashes in them.
Thanks for the quick answers. It helps.
I think allowing people to clone
https://github.com/zephyrproject-rtos/civetweb but not to check out
the problematic historic revision is a reasonable choice, especially
since it's still possible to get the data from there using
git checkout <rev> -- . ':!bad-paths'
[...]
> Or maybe you know of a code path in the `unpack_trees()` machinery that
> does _not_ go through `add_index_entry()`? I would be very interested to
> learn about such code paths.
Every once in a while someone (e.g., in #git) has wanted "git checkout
--skip-index <rev> -- <paths>", and that would be the natural way to
implement such a thing. But no one has done it yet. :)
We'll just have to keep a watchful eye as people make new
contributions.
Sincerely,
Jonathan
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the Git mailing list, Johannes Schindelin wrote (reply to this):
Hi Jonathan,
On Fri, 3 Jan 2020, Jonathan Nieder wrote:
> Johannes Schindelin wrote:
>
> > As mentioned above, the idea is to prevent Git from attempting to create
> > files with illegal file name characters.
> [...]
> > On Thu, 26 Dec 2019, Jonathan Nieder wrote:
>
> >> Is there anything we can or should do to prevent people checking in
> >> new examples of paths with backslash in them (on all platforms)?
> >
> > As mentioned in my reply to Junio, I don't think that it would be wise to
> > even try to warn about backslashes in the file names. There are _so_ many
> > Git users out there, I am convinced that at least some of them have valid
> > use cases for file names with backslashes in them.
>
> Thanks for the quick answers. It helps.
>
> I think allowing people to clone
> https://github.com/zephyrproject-rtos/civetweb but not to check out
> the problematic historic revision is a reasonable choice, especially
> since it's still possible to get the data from there using
>
> git checkout <rev> -- . ':!bad-paths'
>
> [...]
> > Or maybe you know of a code path in the `unpack_trees()` machinery that
> > does _not_ go through `add_index_entry()`? I would be very interested to
> > learn about such code paths.
>
> Every once in a while someone (e.g., in #git) has wanted "git checkout
> --skip-index <rev> -- <paths>", and that would be the natural way to
> implement such a thing. But no one has done it yet. :)
>
> We'll just have to keep a watchful eye as people make new
> contributions.
Right.
As to my patch, in the meantime I wondered whether I should simply extend
`verify_path()` to take a `flags` parameter where I can specify whether
this is intended for checkout, and handle the backslashes _there_. That
would cover e.g. uses of `make_transient_cache_entry()`, too.
The only reason why I would want to introduce `flags` is that `git apply`
has this `--unsafe-paths` option where `verify_paths()` is called on the
file names in the diff, and I was unsure whether there are legitimate use
cases with diffs containing DOS-style paths (I suspect that there are
legitimate scenarios where this is desirable).
What do you think of that?
Ciao,
Dscho
bdee6f5 to
d7117ff
Compare
|
On the Git mailing list, Johannes Schindelin wrote (reply to this): |
|
This patch series was integrated into pu via ef6ae42. |
|
This branch is now known as |
During a clone of a repository that contained a file with a backslash in its name in the past, as of v2.24.1(2), Git for Windows prints errors like this: error: filename in tree entry contains backslash: '\' The idea is to prevent Git from even trying to write files with backslashes in their file names: while these characters are valid in file names on other platforms, on Windows it is interpreted as directory separator (which would obviously lead to ambiguities, e.g. when there is a file `a\b` and there is also a file `a/b`). Arguably, this is the wrong layer for that error: As long as the user never checks out the files whose names contain backslashes, there should not be any problem in the first place. So let's loosen the requirements: we now leave tree entries with backslashes in their file names alone, but we do require any entries that are added to the Git index to contain no backslashes on Windows. Note: just as before, the check is guarded by `core.protectNTFS` (to allow overriding the check by toggling that config setting), and it is _only_ performed on Windows, as the backslash is not a directory separator elsewhere, even when writing to NTFS-formatted volumes. An alternative approach would be to try to prevent creating files with backslashes in their file names. However, that comes with its own set of problems. For example, `git config -f C:\ProgramData\Git\config ...` is a very valid way to specify a custom config location, and we obviously do _not_ want to prevent that. Therefore, the approach chosen in this patch would appear to be better. This addresses git-for-windows#2435 Signed-off-by: Johannes Schindelin <[email protected]>
d7117ff to
d6da831
Compare
|
This patch series was integrated into pu via 7f12fa8. |
|
This patch series was integrated into pu via 4b15b22. |
|
/submit |
|
Submitted as [email protected] |
|
This patch series was integrated into pu via 3d9b5af. |
|
This patch series was integrated into pu via 91db59b. |
|
This patch series was integrated into next via 3088a0c. |
|
This patch series was integrated into pu via 7a8deed. |
|
This patch series was integrated into pu via a578ef9. |
|
This patch series was integrated into next via a578ef9. |
|
This patch series was integrated into master via a578ef9. |
|
Closed via a578ef9. |
As of Git for Windows, v2.24.1(2). cloning a repository that contained a file with a backslash in its name some time in the past, the command will succeed but at the same time print errors like this:
A corresponding
git fetchwill also show those errors, but fail.The reason is that v2.24.1 is much more strict about backslashes in tree entries than earlier versions. The intention was actually to prevent checking out such files, though: if there was a mistake in a repository long ago that has been fixed long since, there is actually no reason why we should require the history to be rewritten.
This fixes git-for-windows#2435.
The idea of this patch is to stop complaining about tree entries, and focus instead on the index: whenever a file is added to the index, we do not want any backslashes in the file name on Windows.
As before, this check is only performed on Windows, and only under
core.protectNTFS. On other platforms, even ifcore.protectNTFSis turned on, the backslash is not a directory separator, therefore the Operating System's syscalls will (should?) refuse to create files on NTFS with backslashes in their file name.I would appreciate reviews with a particular eye on keeping users safe: I am not 100% certain that all relevant file writes go through the index (I think that they all go through the index, but I might have well missed a corner case).
Changes since v1:
GIT_WINDOWS_NATIVE-only, etc).