-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Fix: FileSystemEntry.Attributes property is not correct on Unix #52235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
Tagging subscribers to this area: @carlossanlop Issue DetailsFixes #37301 New attempt - this time without a perf regression. Changes in this PR:
cc @iSazonov @mklement0 @jozkee @adamsitnik @jeffhandley I collected perf results for the Command: Result 1
Result 2
Result 3
Result 4
|
src/libraries/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemEntry.Unix.cs
Outdated
Show resolved
Hide resolved
…tatus.GetAttributes from FileSYstemEntry.Attributes since the disk hit is necessary anyway.
|
Updated perf results for commit 8c23640 . Ran them 4 times again since IO results tend to be somewhat volatile: Result 1
Result 2
Result 3
Result 4
|
| public DateTimeOffset LastWriteTimeUtc => _status.GetLastWriteTime(FullPath, continueOnError: true); | ||
| public bool IsDirectory => _status.InitiallyDirectory; | ||
| public bool IsHidden => _directoryEntry.Name[0] == '.'; | ||
| public bool IsHidden => _directoryEntry.Name[0] == '.' || _status.IsHidden(FullPath); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid it's completely incorrect to map extended attributes to classic attributes.
I think we need new ExtendedAttributes (or something like) on both Unix and Windows.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| attributes |= FileAttributes.Directory; | ||
|
|
||
| // If the filename starts with a period or has UF_HIDDEN flag set, it's hidden. | ||
| if (fileName.Length > 0 && (fileName[0] == '.' || (_fileStatus.UserFlags & (uint)Interop.Sys.UserFlags.UF_HIDDEN) == (uint)Interop.Sys.UserFlags.UF_HIDDEN)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iSazonov, you wrote in the other comment:
I'm afraid it's completely incorrect to map extended attributes to classic attributes.
I think we need new ExtendedAttributes (or something like) on both Unix and Windows.
I completely agree with you that we should have an ExtendedAttributes property. It would solve the problem of having them intermixed with normal attributes.
Extended attributes apply to BSD, Linux, OSX and even for Windows, in the form of Reparse Points.
But as you can see, UF_HIDDEN was already being read in the old code before my PR, and I just made sure it was consistently retrieved in FileSystemInfo, FileSystemEntry and FileStatus.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that it's not an either-or scenario: A platform-specific ExtendedAttributes property definitely makes sense, but it also makes sense to map the UF_HIDDEN flag to a platform-abstracted Hidden attribute (for instance, to the macOS Finder (file manager) both UF_HIDDEN files and files that observe the general Unix name-based convention (name starts with .) are hidden.
Has this ExtendedAttributes property been discussed at all?
Note that macOS has open-ended extended file-system attributes, similar to NTFS alternate streams (use xattr -s -l /tmp to see an example on macOS), but - if I understand correctly - NTFS alternate streams aren't supported (yet) in .NET, so that's probably a discussion for another day.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Has this ExtendedAttributes property been discussed at all?
Yes, we have a relatively new issue tracking this request: #49604
Let's take the conversation there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But as you can see, UF_HIDDEN was already being read in the old code before my PR, and I just made sure it was consistently retrieved in FileSystemInfo, FileSystemEntry and FileStatus.
It was added at 5.0 milestone without in-depth analysis.
I'd suggest to pass the design through in-depth analysis and API review.
My main concern is that it (chflags) is absolutely another API with specific behaviors and specific error codes.
I have not found a standard for these attributes. The most comprehensive list is https://www.freebsd.org/cgi/man.cgi?query=chflags&sektion=2&manpath=freebsd-release-ports
As we can see there are references to DOS, Windows and CIFS FILE_ATTRIBUTE_* attributes. I don't understand what that means and what the design intentions is.
If there are no standard behavior definitions, I would make different APIs to avoid intractable problems in the future.
There's an attribute mapping, but we can't map "." in UF_HIDDEN.
Perhaps the trend is to abandon classic attributes altogether and only use extended attributes. In that case, we definitely shouldn't mix them.
Since the docs speak about "mapping" then surely this API must perform "." mapping in UF_HIDDEN.
The final thought is that we need to be consistent with the other communities (Java, Python, ...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After taking a look at #49604 it occurred to me that what we are talking about here is not a case of extended attributes, as the latter are user-definable data with no special meaning to the file-system or the system.
By contrast, the so-called file flags on macOS / FreeBSD - of which UF_HIDDEN is one example - do have system-defined meaning, analogous to the file attributes on Windows.
These file flags do not map cleanly onto the born-on-Windows System.IO.FileAttributes enumeration, though mapping UF_HIDDEN onto Hidden is currently partially implemented:
Examining an enumerated FileSystemInfo instance does surface UF_HIDDEN as the Hidden attribute in .Attributes, but the enumeration itself does not consider such files hidden.
Curiously, the same inconsistency applies to the ls utility on macOS (my guess is that the following also applies to FreeBSD): ls lists UF_HIDDEN files as if they weren't hidden, but ls -lO does list all file flags (including UF_HIDDEN as hidden, the name also used by the chflags utility).
When POSIX-compatible shells on macOS perform globbing, they also do not consider UF_HIDDEN files hidden.
By contrast, Finder, the macOS file manager, equally considers .-initial names and UF_HIDDEN files hidden.
De facto the .Attributes property is already being used to map non-Windows concepts onto the equivalent Windows concepts:
- Symlinks on Unix-like platforms get the pseudo attribute
ReparsePoint - The user-specific lack of write permissions is mapped on to pseudo attribute
ReadOnly
To me, these mappings make sense, but I agree that they should be thoroughly reviewed, with a view toward extending this mapping; e.g., UF_IMMUTABLE and SF_IMMUTABLE are candidates for mapping onto ReadOnly too.
With respect to UF_HIDDEN, we need to decided whether (a) we want to resolve the current inconsistency and, if so, (b) in what direction.
Consistently mapping both .-initial names and UF_HIDDEN files onto Hidden is defensible, even though in the world of shells it would make PowerShell's behavior then differ from that of ls / POSIX-compatible shells.
then surely this API must perform "." mapping in UF_HIDDEN
I don't think so: it seems fine to me to map into one direction: to have the platform-abstracted Hidden attribute reflect whether the file is effectively hidden, irrespective of the mechanism used, and on macOS / FreeBSD there happen to be two mechanisms available, both of which are - unfortunately only partly - honored by the system.
If we were to introduce a dedicated API for macOS / FreeBSD file flags (again, this is not the same as extended attributes), then it should obviously reflect the true flag values only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After taking a look at #49604 it occurred to me that what we are talking about here is not a case of extended attributes, as the latter are user-definable data with no special meaning to the file-system or the system.
Yes, I suggest to use another terminology here - file system flags/ fsflags/ FileSystemInfo.Flags.
Since mapping is very questionable and may cause breaking conflicts in the future, I suggest avoiding it now and only presenting it in some way.
The application can be fixed faster than .Net. For example, PowerShell will be able to handle UF_HIDDEN in a special way and this can be changed in any intermediate version, unlike .Net which will remain stable from version to version.
src/libraries/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemEntry.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemEntry.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemEntry.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemEntry.Unix.cs
Outdated
Show resolved
Hide resolved
|
Benchmarks for commit 6ff6dcc : Result 1
Result 2
Result 3
Result 4
|
src/libraries/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemEntry.Unix.cs
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/FileStatus.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/FileStatus.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemEntry.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemEntry.Unix.cs
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemEntry.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemEntry.Unix.cs
Outdated
Show resolved
Hide resolved
…cessibility of methods/properties. Addressing suggestions.
|
Benchmark results for commit 2423af0 : Result 1
Result 2
Result 3
Result 4
|
src/libraries/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemEntry.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemEnumerator.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemEnumerator.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/FileStatus.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/FileStatus.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/FileStatus.Unix.cs
Outdated
Show resolved
Hide resolved
… struct initialization. Simplify the order in which Initialize checks for unknown and symlink and add a comment.
|
Benchmark results for commit 7de7c4a : Result 1
Result 2
Result 3
Result 4
|
|
Ping @jozkee @adamsitnik - I have addressed the latest feedback, including some I received from David in our last call. |
iSazonov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (with fundamental design questions as discussed).
adamsitnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good to me. I have left one comment regarding the calls to GetEUid and GetEGid that could give us some potential performance benefits, but this is not related to your changes and can be addressed in the future (possibly with some refactoring that could do a better job of encapsulation of the logic responsible for each flag).
thank you @carlossanlop !
| readBit = Interop.Sys.Permissions.S_IRUSR; | ||
| writeBit = Interop.Sys.Permissions.S_IWUSR; | ||
| } | ||
| else if (_fileCache.Gid == Interop.Sys.GetEGid()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not related to your changes, but important from perf perspective: is there any scenario in which this property (HasReadOnlyFlag) is going to be called multiple times? I am asking because we have at least one sys-call here (GetEUid), possibly even two (GetEGid). When this is used for files enumeration, we might end up asking the OS for the current user information (that can't change during enumeration?) many times. This could be done just once and reused for given enumerator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question. Let's walk through 3 enumeration cases I can think of:
-
Enumeration without any special filtering - When the files/folders are initialized (via
FileSystemEntry.Initialize) we only perform a soft check toHasReadOnlyFlag: we only access it if thefileCachewas already pre-initialized. The cache will rarely be initialized at this point (only when we detect symlinks or unknowns). -
Enumeration using
AttributesToSkipflags - This is where we callShouldSkipfor each one of the 4 attributes, and at this point, we make sure thefileCacheis initialized, meaning we will hit the disk, and also make these calls. Since each file/folder is analyzed once here,GetEUidandGetEGidare only called once. -
The user enumerates entries in a folder, specifying a condition via the
ShouldInclude,ShouldRecurseor theTransformEntrypredicates that will access eitherAttributesToSkipor theAttributesproperty - We will do disk hits for each file/folder that will enterHasReadOnlyFlag.
So as you can see, there isn't a huge impact on enumeration with default values (1), only when we apply some special filterings (2, 3). If the user specifies more than one predicate (3) and also checks for AttributesToSkip separately, then maybe the read-only flag will be accessed multiple times, if there's a filter requiring to check its value.
I don't feel this change needs to be included in this PR, but I opened an issue to do it later: #52456
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we cache these calls, when should they be invalidated?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Refresh should set them to null, and if the user calls HasReadOnlyFlag again, then they get populated once more for the first time (in other words: on demand).
src/libraries/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemEntry.Unix.cs
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemEntry.Unix.cs
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemEntry.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/Enumeration/FileSystemEnumerator.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/FileStatus.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/FileStatus.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/FileStatus.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/FileStatus.Unix.cs
Outdated
Show resolved
Hide resolved
…conditions: symlink and unknown, so use else if.
…ed and could cause unnecessary disk hit.
…ialize if in AttributesToSkip we are going to retrieve it anyway with a disk hit. Only in the case of DT_UNKNOWN, we will pre-retrieve the file cache, but we only need it when ShouldSkip is checked, and in that case, we won't have a disk hit.
src/libraries/System.IO.FileSystem/src/System/IO/FileStatus.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/FileStatus.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/src/System/IO/FileStatus.Unix.cs
Outdated
Show resolved
Hide resolved
|
@jozkee Benchmark results for commit 7f6cee4 : Details
|
| // Avoid disk hit first | ||
| if (IsNameHidden(fileName)) | ||
| { | ||
| // Others permissions | ||
| readBit = Interop.Sys.Permissions.S_IROTH; | ||
| writeBit = Interop.Sys.Permissions.S_IWOTH; | ||
| return true; | ||
| } | ||
| #endif | ||
| EnsureCachesInitialized(path, continueOnError); | ||
| return HasHiddenFlag; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIRC you mentioned that UF_HIDDEN is only effective in OSX, if so we should consider splitting this implementation and optimize Unix by avoiding the native call.
jozkee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
|
@carlossanlop Great work! Thanks! |
Fixes #37301
New attempt - this time without a perf regression. Changes in this PR:
statandlstatresults insideFileStatus.FileSystemEntryandFileStatus.HiddenandReadOnlyflags are now being collected when expected, and made sure they are collected lazily, to avoid the perf regression on enumeration when not required.cc @iSazonov @mklement0 @jozkee @adamsitnik @jeffhandley
I collected perf results for the
System.IO.Tests.Perf_Directorybenchmarks four times, before and after my changes, because IO benchmarks tend to yield not-so-stable results. I wanted to make sure the results stayed under a close margin of error:Command:
Result 1
Result 2
Result 3
Result 4