Skip to content

[pkg] Resolve directory symlinks in fetched targets#13792

Open
ElectreAAS wants to merge 2 commits intoocaml:mainfrom
ElectreAAS:push-xmyqtonwkonu
Open

[pkg] Resolve directory symlinks in fetched targets#13792
ElectreAAS wants to merge 2 commits intoocaml:mainfrom
ElectreAAS:push-xmyqtonwkonu

Conversation

@ElectreAAS
Copy link
Copy Markdown
Collaborator

@ElectreAAS ElectreAAS commented Mar 13, 2026

Housekeeping

This PR fixes the tests in #13393
#9873 will not be fixed, but still a significant step towards fixing #13678.

What this PR does

After fetching package sources, add a pass resolving directory symlinks. As they're not problematic at this stage, file symlinks are left as is. Broken symlinks are removed silently to preserve existing behaviour.

Note: both portable_hardlink and portable_symlink work backwards from what I initially understood, see #13791

Done with the help of @Alizter, so thanks :)

Comment thread test/blackbox-tests/test-cases/pkg/source-with-directory-symlink.t
@ElectreAAS ElectreAAS requested review from art-w and rgrinberg March 16, 2026 17:25
@shonfeder shonfeder requested a review from Alizter March 19, 2026 13:08
@Alizter Alizter mentioned this pull request Mar 20, 2026
14 tasks
Copy link
Copy Markdown
Collaborator

@Alizter Alizter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is an example that I think causes a loop in the current algorithm:

  /source/
    dir_a/
      link_to_b -> ../dir_b
    dir_b/
      link_to_a -> ../dir_a

Following this along we get an infinitely descending directory.

I think you will need to keep a set of resolved targets and check against it if you want to remember where you have been as to not repeat the resolution.

Another issue that I see is the multiple passes that are currently done, but we can improve this later once we have something that works correctly.

We also might need to add some validation that we don't escape the source directory, I don't think its a good idea to allow symlinks to / for example.

@ElectreAAS ElectreAAS marked this pull request as draft March 26, 2026 22:43
@ElectreAAS
Copy link
Copy Markdown
Collaborator Author

I just pushed work in progress to adress the latest comment about cycles.
A basic cycle detection mechanism is in place, and works in simple scenarios (basic-cycle.t), but fails in more complex ones (link-to-parent.t) and crashes the sandboxing mechanisms of cram tests. Fun stuff

@Alizter Alizter self-requested a review March 27, 2026 07:56
Comment thread src/dune_pkg/fetch.ml Outdated
Comment thread src/dune_pkg/fetch.ml Outdated
This fails correctly
$ build_pkg bar 2>&1 | sanitize_pkg_digest bar.0.0.1 | tail -3
Error: Unable to resolve symlink
_build/_private/default/.pkg/bar.0.0.1-DIGEST_HASH/source/dir_b/link_to_a/link_to_b,
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI isn't happy because it finds dir_a/link_to_b/link_to_a before the one I wrote in the test

@ElectreAAS ElectreAAS marked this pull request as ready for review April 9, 2026 16:20
@ElectreAAS
Copy link
Copy Markdown
Collaborator Author

Aside from the CI failures due to non-deterministic errors, the main code is ready for review.
Special attention should be given to the different Path.something.t, I had to hit them with a hammer until they looked nice, but I may very well have misunderstood the different invariants at play

@rgrinberg
Copy link
Copy Markdown
Member

I haven't looked closely, but I think your changes might have made that test non-deterministic.

Copy link
Copy Markdown
Collaborator

@Alizter Alizter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some problems I've observed:

  1. There is a difference between local and tarball sources when it comes to directory symlinks. The tarball sources correctly resolve the contents wheras local sources silently discard them. I think rather than silently discarding them we should either raise a user error if this is something we wish not to support or support it. I would probably consider not supporting it.

  2. Broken symlinks appear to be silently ignored. For example something stupid like a symlink to itself. We should probably error to the user in this case saying that we don't accept such symlinks.

explicitly deleted immediately after the archive is
extracted. This logic is implemented in the "source-fetch"
action spec in [Fetch_rules]. *)
(* CR-Ambre update above comment *)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

todo?

Comment thread src/dune_pkg/fetch.ml
(* [raw_resolved] is a relative build path but it might contain indirections,
something like _build/foo/../bar
or _build/../outside *)
let canon_resolved = Path.of_string raw_resolved in
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This only canonicalises relative paths and not absolute ones. I think Path.External.canonicalize_abs was the other way you did it.

Comment thread otherlibs/stdune/src/path.ml Outdated
| In_build_dir t, In_build_dir of_ ->
Option.map (Build.descendant t ~of_) ~f:(fun d ->
in_source_tree (Source0.of_local (Build.local d)))
| External t, External of_ -> Option.map (External.descendant t ~of_) ~f:external_
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think one of the reasons this was avoided was because it causes complications with the other path types. For example shouldn't everything be considered of the root? I don't know if it makes sense to add such a function yet. It would probably be better to have a descendent function specifically for External instead rather than allowing them to mix.

Comment thread src/dune_rules/pkg_rules.ml Outdated
| None -> Memo.return Path.Local.Set.empty
| Some root -> loop root Path.Local.Set.empty Path.Local.root
| Some root ->
let rec collect_sources ~acc ~dir ~already_seen =
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are two separate concepts we are confusing here. We have the directory we would like to read from and the directory we would like to output to.

Normally these two are the same, but with directory symlinks the situation can differ.

In this case I would introduce a new parameter called ~read_dir and let full_path use that instead.

Then when you collect files, rather than appending relative you append a tuple relateive, read_relative where the latter is

let read_relative = Path.Local.relative read_dir name in

Then when you have a directory symlink, you can do:

-                           else files, local_resolved :: dirs, seen
+                           else files, (relative, local_resolved) :: dirs, seen

instead.

In the end, your parallel map will look like:

Memo.parallel_map dirs ~f:(fun (dir, read_dir) ->
 collect_sources
  ~acc:Path.Local.Set.empty
  ~dir
  ~read_dir
  ~already_seen:seen)

This should fix some issues we currently have.

@Alizter
Copy link
Copy Markdown
Collaborator

Alizter commented Apr 14, 2026

Here are some of the issues we currently have:

Test symlink chains: A -> B -> C where intermediate links are also symlinks.

  $ mkdir -p _src/real_dir
  $ echo "content" > _src/real_dir/file.txt
  $ ln -s real_dir _src/link_a
  $ ln -s link_a _src/link_b
  $ ln -s link_b _src/link_c

  $ make_lockdir

Case 1: local source with symlink chain

For local sources, directory symlinks are not copied at all (they are skipped
during the source copy). Only the real directory is present.

  $ make_lockpkg foo <<EOF
  > (version 0.0.1)
  > (source
  >  (fetch
  >   (url file://$PWD/_src)))
  > (build (run cat real_dir/file.txt))
  > EOF

  $ build_pkg foo
  content

Only the real directory is copied - symlink chains are lost entirely:
  $ ls _build/_private/default/.pkg/foo.*/source | sort
  link_a
  link_b
  link_c
  real_dir

Case 2: tarball source with symlink chain

For tarballs, symlinks are preserved during extraction and then resolved.

  $ tar czf _src.tar.gz _src

  $ make_lockpkg bar <<EOF
  > (version 0.0.1)
  > (source
  >  (fetch
  >   (url file://$PWD/_src.tar.gz)))
  > (build (run cat real_dir/file.txt))
  > EOF

  $ build_pkg bar
  content

  $ ls _build/_private/default/.pkg/bar.*/source | sort
  link_a
  link_b
  link_c
  real_dir

BUG: Different behavior between local source and tarball. With local sources,
the symlink chain is completely lost (link_a, link_b, link_c are missing).
With tarballs, all links are resolved to real directories. This inconsistency
means a package behaves differently depending on how it is fetched.

@ElectreAAS ElectreAAS force-pushed the push-xmyqtonwkonu branch 2 times, most recently from bc64415 to 71bc486 Compare April 20, 2026 09:50
@ElectreAAS
Copy link
Copy Markdown
Collaborator Author

ElectreAAS commented Apr 20, 2026

Update: we've looked further into this and it seems the desired behaviour in Pkg_rules.source_files is rather complicated, almost equating actually supporting symlinks in the engine.
Since the point of this PR was to allow dune pkg to build fetched packages that may contain symlinks, and that we'd added the resolving pass in source_files only for symmetry, we came to the conclusion that it was unnecessary, and have removed it.

I've pushed a new version containing that deletion, along with a few WIP comments, I'm looking into them

Signed-off-by: Ambre Austen Suhamy <ambre@tarides.com>
… source)

Sort entries in fetch to guarantee determinism
Remove the symlink resolution happening in pkg_rules as it's too complicated.
The main change happening in fetch is still there.

Signed-off-by: Ambre Austen Suhamy <ambre@tarides.com>
@ElectreAAS
Copy link
Copy Markdown
Collaborator Author

About the Path.{is_,}descendant functions:
I no longer use descendant so I reverted the addition, but I still do use is_descendant on external paths, and the addition is useful, without it we get the following:

test/expect-tests/dune_pkg/fetch_tests.ml - line 197
[...]
-|  [%expect
-|    {|
-|    Done downloading
-|    Finished successfully, no checksum verification
-|    ------
-|    files in target dir:
-|    file2.md
-|    plaintext.md |}]
+|  [%expect.unreachable]
+|[@@expect.uncaught_exn {|
+|  (Dune_util__Report_error.Already_reported)
+|  Trailing output
+|  ---------------
+|  Error: Unable to resolve symlink
+|  [...]/dune/_build/.sandbox/HASH/default/test/expect-tests/dune_pkg/tarball/file2.md:
+|  its target
+|  [...]/dune/_build/.sandbox/HASH/default/test/expect-tests/dune_pkg/tarball/../../../../../../../default/test/expect-tests/dune_pkg/tar-inputs/file2.md
+|  is outside the source directory
+|  |}]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants