Skip to content

Add ?follow option to Unix.link#1061

Merged
damiendoligez merged 1 commit intoocaml:trunkfrom
madroach:link_symlink
Jun 25, 2018
Merged

Add ?follow option to Unix.link#1061
damiendoligez merged 1 commit intoocaml:trunkfrom
madroach:link_symlink

Conversation

@madroach
Copy link
Contributor

@madroach madroach commented Feb 24, 2017

This allows hardlinking symlinks.

Also fix a bug in win32 symlink stub code, which interpreted an
optional parameter (bool option) as simple bool. [EDIT: there was no bug, in fact]

DWORD flags =
Is_block(to_dir) && Bool_val(Field(to_dir, 0))
? SYMBOLIC_LINK_FLAG_DIRECTORY
: 0;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll can't do a full review today, but I can say immediately that this is wrong - see otherlibs/win32unix/unix.ml

@xavierleroy
Copy link
Contributor

xavierleroy commented Feb 24, 2017

linkall is POSIX 2008 and I'm not sure it's available everywhere. You need to guard its use with

#if _XOPEN_SOURCE >= 700 || _POSIX_C_SOURCE >= 200809L

and provide an #else implementation that uses link and fails in the cases that link cannot handle.

@xavierleroy
Copy link
Contributor

Also, some motivation on why we really need to hardlink symlinks would be welcome.

@dsheets
Copy link
Member

dsheets commented Feb 24, 2017

If I'm reading the binding correctly, ?follow defaults to false which is the opposite of the previous default:

link() will resolve and follow symbolic links contained within both path1 and path2.

Additionally, at least on OS X,

On OS X, not assigning AT_SYMLINK_FOLLOW to flag may result in some filesystems returning an error.

@dra27
Copy link
Member

dra27 commented Feb 24, 2017

Why not instead add Unix.linkat (with relevant configure-based detection for the underlying C function)?

@shindere
Copy link
Contributor

shindere commented Feb 24, 2017 via email

@xavierleroy
Copy link
Contributor

Concerning the "bug fix" I confirm there is no bug and the fix would break the win32 code.

@madroach : if you want to keep this PR up for discussion, please remove the 'bug fix", correct the default behavior of Unix.link, and add a fallback implementation if linkat() is not available. Otherwise we'll close this PR.

@madroach madroach force-pushed the link_symlink branch 2 times, most recently from e458097 to 3e99f64 Compare February 25, 2017 11:54
@madroach
Copy link
Contributor Author

I updated the pull request in the following ways:

  • removed my supposed bug fix in the win32 symlink code. I missed out on the option already being handled in unix.ml. There is no bug.
  • I also added checks for __POSIX_VERSION and _XOPEN_VERSION version to use simple link as fallback.
  • The default behavior if no ~follow parameter is supplied is now following symlinks.

On the behavior of link the linux manpage on link(2) says this:

POSIX.1-2001 says that link() should dereference oldpath if it is a symbolic link. However, since kernel 2.0, Linux does not do so: if oldpath is a symbolic link, then newpath is created as a (hard) link to the same symbolic link file (i.e., newpath becomes a symbolic link to the same file that oldpath refers to). Some other implementations behave in the same manner as Linux. POSIX.1-2008 changes the specification of link(), making it implementation-dependent whether or not oldpath is dereferenced if it is a symbolic link. For precise control over the treatment of symbolic links when creating a link, see linkat(2).

Also http://pubs.opengroup.org/onlinepubs/9699919799/functions/link.html may be of interest.

I have no strong preference, but the not dereferencing approach seems saner to me because one could still dereference the symlink with readlink(2) and then create a hardlink to the target file.

As to why this change is needed:

Creating hardlinks of symlinks is error prone:

  • when following the symlink and the symlink points into another filesystem, the hardlink will fail with EXDEV
  • when not following the symlink and the symlink is relative and the new hardlink resides on a different directory than the symlink, the new (hardlinked) symlink will probably be broken and trying to access it will fail with ENOENT

Therefore the programmer often will need some control on what behaviour he prefers.
I need this feature for implementing a hardlinking backup solution like storeBackup or Apple Time Machine.
I could work around it by recreating new symlinks, so I don't depend on it. Still the cleaner solution would be hardlinking.

@xavierleroy
Copy link
Contributor

xavierleroy commented Feb 25, 2017

Thanks for the revised code, much appreciated. I have two more suggestions based on the discussion above and on the code:

  • I would prefer that Unix.link behaves exactly like before if the ~follow: optional argument is not given. Since the follow/nofollow behavior of link() seems to depend on the target platform, the only way to achieve this compatibility is by calling link() and not linkat() when ~follow is not given.
  • If ~follow: is given and linkat() is not available, the current code silently falls back to using link(), while I think it should fail.

Taken together, this would give a different code logic such as:

   int ret;
   caml_unix_check_path(path1, "link");
   caml_unix_check_path(path2, "link");
   p1 = caml_strdup(String_val(path1));
   p2 = caml_strdup(String_val(path2));
   caml_enter_blocking_section();
   if (follow == Val_int(0)) {
     ret = link(p1, p2);
   } else {
#if _XOPEN_VERSION >= 700 || _POSIX_VERSION >= 200809L
     int flags = Bool_val(Field(follow, 0)) ? AT_SYMLINK_FOLLOW : 0;
     ret = linkat(AT_FDCWD, p1, AT_FDCWD, p2, flags);
#else
     errno = ENOSYS /*or something more appropriate*/; ret = -1;
#endif
   }

@madroach
Copy link
Contributor Author

Thanks for your input. This is indeed the most sensible behaviour. I adapted the code as you suggest, but kept the parsing of the option out of the blocking section because I'm not sure whether access to the option is protected from interference by the garbage collector.

@dsheets
Copy link
Member

dsheets commented Feb 28, 2017

It would be great to have the documentation updated to describe the ?follow behavior including None ➡️ link() and Some (true | false) ➡️ linkat().

@madroach
Copy link
Contributor Author

madroach commented Mar 1, 2017

documentation added.

Copy link
Member

@dra27 dra27 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to bang on about it, but I'm still unconvinced why this is being done as an optional parameter, and not simply a separate binding as Unix.linkat, given that the new behaviour can only be triggered by specifying the new optional argument.

It also seems potentially unwise, however unlikely it may be for this particular syscall to gain flags, to bind something that is int flags in C effectively with bool in OCaml.

caml_unix_check_path(path2, "link");
p1 = caml_strdup(String_val(path1));
p2 = caml_strdup(String_val(path2));
ret =
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory is cheap - can we risk having another int flags; rather than confusingly co-opting one called ret? That said, see below...

: 0;
caml_enter_blocking_section();
ret = link(p1, p2);
if (follow == Val_int(0) /* None */)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, if (!Is_block(follow)) /* None */?

ret = link(p1, p2);
else /* Some bool */
# if _XOPEN_VERSION >= 700 || _POSIX_VERSION >= 200809L
ret = linkat(AT_FDCWD, p1, AT_FDCWD, p2, ret);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not, instead of ret, just do (Bool_val(Field(follow, 0)) ? AT_SYMLINK_FOLLOW : 0) here, and not need a variable (ret or otherwise)? You already know that follow is a block by this branch.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I safely dereference follow here in the nonblocking section?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a very good point (I don't think the access in the if is correct, therefore, either) - it would be better to have two lots of enter_blocking_section around both link and linkat (which again, begs my previous question of why this isn't just a separate binding).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The if (follow == Val_int(0)) is correct. It does not dereference follow.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed. Assuming this version is kept, it's therefore up to you whether to wrap the two syscalls with their own leave/enter_blocking_section or declare a flags variable earlier (my comment about co-opting ret still stands!!)

# if _XOPEN_VERSION >= 700 || _POSIX_VERSION >= 200809L
ret = linkat(AT_FDCWD, p1, AT_FDCWD, p2, ret);
# else
{ ret = -1; errno = ENOSYS }
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this looks so horribly like some kind of struct initialiser, I'd just move the braces outside the #if...

if (follow == Val_int(0) /* None */)
ret = link(p1, p2);
else /* Some bool */
# if _XOPEN_VERSION >= 700 || _POSIX_VERSION >= 200809L
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RHEL 5 is about to go out of support, but is it definitely OK to be ignoring the glibc < 2.10 version of this check?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How would that look like?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking here, I think it just means adding || defined(_ATFILE_SOURCE) /* For 2.4 <= glibc < 2.10 */

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok. I included the change in my tree. But I have to admit I don't understand feature_test_macros(7). Aren't those macros used to request, not check for features?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed - I'm not certain how this is supposed to work with glibc, as you do not appear to be permitted to check the version of the glibc headers you are compiling. My (quite possibly incorrect) understanding is that if someone is using glibc < 2.10 then they should ensure that -D_ATFILE_SOURCE gets into CFLAGS at configure-time (it's possible that this is something configure itself should determine is required, that's not clear at all).

However, it does seem to me to be OK to test it here.

@madroach
Copy link
Contributor Author

madroach commented Mar 2, 2017

What about this interface?

type link_flag =
|L_FOLLOW

val fd_cwd:dir_handle

val linkat ~flags:link_flag array ->
  ?src_dir:dir_handle -> ~src:string ->
  ?dst_dir:dir_handle -> ~dst:string -> unit

But on the other hand the ?follow:bool -> string -> string -> unit interface won't break anything and we can still add linkadd in case more flags are needed.

@dra27
Copy link
Member

dra27 commented Mar 4, 2017

I agree that the ~follow parameter feels an easy way of adding this, but it also (to me) encodes far too much important (user) consideration into the two constructors of bool option (even though you've documented it). Upon further reflection, I also dislike the Windows treatment - at present, hardlinking to a symbolic link on NTFS is a contradiction in terms, so this parameter should not be ignored on Windows, it really ought to be an exception if follow is false.

I would push for the separate interface (if a user - such as yourself - needs the version with optional argument, it is easy to define) but with the following changes:

  • No need to expose fd_cwd - it's well-encoded as the default for ?src_dir & ?dst_dir and can be compatibly added in future if some extra special values appear (as you would still want the default to be AT_FDCWD)
  • There are of course other *at functions (e.g. fstatat(2)) which might want binding in future. Given the way these flags work, this could well be a case to use a polymorphic variant. Alternatively, the type should be better named (e.g. at_flags with an AT_) prefix. The polymorphic variant would be superior as it would allow each at function to specify which flags are permitted and which aren't, obviously.
  • I assume that the labels would be for UnixLabels only (I think that's the "policy" even for new functions)

@xavierleroy
Copy link
Contributor

I think this PR is getting out of hand and turning into a design discussion for a much bigger extension than initially offered. In particular, if linkat is exposed in its full generality, the other *at functions should also, and that's not a decision to be taken lightly because the *at functions have no Win32 counterpart as far as I know.

@madroach
Copy link
Contributor Author

madroach commented Mar 4, 2017

I updated the pull request to include most suggestions by dra27.

According to CreateHardLink the only behavior on win32 is following hardlinks.
Therefore the win32 port will now fail with NOSYS if ~follow:false is requested. The unix port will fail in the same way if linkat is unavailable and any specific ~follow behavior is requested.

@ghost
Copy link

ghost commented Mar 6, 2017

Since we are talking about the *at functions, last year I wrote a bindings or them. This will be released in the main opam repository this week

@madroach madroach force-pushed the link_symlink branch 2 times, most recently from 4cdc229 to e66d41f Compare May 1, 2017 11:40
@damiendoligez damiendoligez added this to the 4.07-or-later milestone Sep 29, 2017
@damiendoligez damiendoligez removed this from the consider-for-4.07 milestone Jun 4, 2018
@damiendoligez
Copy link
Member

@dra27 @xavierleroy can we get consensus on whether to include this?

Is_block(follow) && Bool_val(Field(follow, 0)) /* Some true */
? AT_SYMLINK_FOLLOW
: 0;
caml_unix_check_path(path1, "link");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This int flags computation should be moved to the else part below, and protected by the #if linkat_is_available, since AT_SYMLINK_FOLLOW may not be defined.

@xavierleroy
Copy link
Contributor

What can I say?

  1. The API is fine.
  2. The implementation is almost good, see my suggestion above.
  3. I'm still not sure how much we care about this issue. Hard links are very rarely used nowadays.

@damiendoligez
Copy link
Member

@madroach if you fix the int flags issue and rebase, we'll merge this PR.

@madroach madroach force-pushed the link_symlink branch 2 times, most recently from 41701de to 67e4f2f Compare June 5, 2018 18:52
@madroach
Copy link
Contributor Author

done and rebased.

Copy link
Member

@damiendoligez damiendoligez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix Changes and we're good.

Changes Outdated
now uses nanosleep(1) for enabling better preemption.
(b) Thread.delay is now an alias for Unix.sleepf.

- Add ?follow parameter to Unix.link. This allows hardlinking symlinks.
Copy link
Member

@damiendoligez damiendoligez Jun 18, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to add the GPR number, move the entry to the right place (not under 4.07) add the real names of everyone involved, and cut your lines to 80 characters, as documented in https://github.com/ocaml/ocaml/blob/trunk/CONTRIBUTING.md#changelog

Changes Outdated
(Leo White, Xavier Clerc, report by Alex, review by Gabriel Scherer
and Pierre Chambart)

### Tools:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you added this by mistake.

This allows hardlinking symlinks.
@madroach
Copy link
Contributor Author

fixed Changes

@damiendoligez damiendoligez merged commit cba4ca5 into ocaml:trunk Jun 25, 2018
@gasche
Copy link
Member

gasche commented Jun 25, 2018

The CI reports failing macos-10.3 and freebsd builds with the same error:

gcc -c -O2 -fno-strict-aliasing -fwrapv -Wall -Werror   -D_FILE_OFFSET_BITS=64 -D_REENTRANT -DCAML_NAME_SPACE  -I../../byterun -o link.o link.c
link.c:45:15: error: use of undeclared identifier 'errno'
    ret = -1; errno = ENOSYS;
              ^
link.c:45:23: error: use of undeclared identifier 'ENOSYS'
    ret = -1; errno = ENOSYS;
                      ^
2 errors generated.

@madroach
Copy link
Contributor Author

Seems like I used the feature test macros in the wrong way.

@nojb
Copy link
Contributor

nojb commented Jun 26, 2018

It may be a missing #include <errno.h> in otherlibs/unix/link.c.

@madroach
Copy link
Contributor Author

madroach commented Jun 26, 2018

A fix is proposed in #1862

stedolan pushed a commit to stedolan/ocaml that referenced this pull request Mar 21, 2023
EmileTrotignon pushed a commit to EmileTrotignon/ocaml that referenced this pull request Jan 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants