Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Add RelocPath type for userdefined relocatable paths #56053

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

fatteneder
Copy link
Member

RelocPath(path) works by splitting path as depot/subpath based on DEPOT_PATH during construction.

When using RelocPath one explicitly opts into relocation, so it throws when path cannot be split.

A relocated path can be queried with String(r::RelocPath), which also throws on failure.


Fixes #54430
Alternative to #55146

@fatteneder fatteneder added the compiler:precompilation Precompilation of modules label Oct 8, 2024
base/loading.jl Outdated Show resolved Hide resolved
@topolarity
Copy link
Member

What's the idiomatic usage pattern for this look like?

@fatteneder
Copy link
Member Author

The example where @__RELOCDIR__ fails #55146 (comment) would become now:

module MyPkg

const data = RelocPath(joinpath(@__DIR__, "myfile.data"))

function read_data()
    path = String(data)
    # load from path ...
end

end

@gbaraldi
Copy link
Member

gbaraldi commented Oct 9, 2024

Is this how we get a Path type finally. I'd love that

@DilumAluthge
Copy link
Member

DilumAluthge commented Oct 17, 2024

It would be nice to get this into Julia 1.12.

EDIT: Given the timeline, I don't think this can make it into 1.12. Maybe we can merge it into master early in 1.13, so that it gets some time to be tested out.

base/loading.jl Outdated Show resolved Hide resolved
base/loading.jl Outdated Show resolved Hide resolved
@DilumAluthge
Copy link
Member

Should we have a way of constructing a RelocPath by just providing the subpath?

@fatteneder
Copy link
Member Author

Is this how we get a Path type finally. I'd love that

FWIW there is already https://juliahub.com/ui/Packages/General/FilePathsBase and https://juliahub.com/ui/Packages/General/FilePaths which provide such types.

Should we have a way of constructing a RelocPath by just providing the subpath?

I am not sure.

@DilumAluthge
Copy link
Member

@vchuravy @gbaraldi and @topolarity Would you be able to review this PR?

As far as I can tell, this PR is an improvement compared to #55146, but I'm not particularly familiar with the relevant part of the codebase.

@DilumAluthge
Copy link
Member

@staticfloat IIRC, you also worked on the depot-relocatibility code; it would be great if you could review this PR.

@DilumAluthge
Copy link
Member

Is this how we get a Path type finally. I'd love that

The RelocPath constructor will error if the path isn't inside a depot, right? So IIUC this doesn't really cover the general "path type" request.

Comment on lines +3375 to +3420
struct RelocPath
subpath::String
Copy link
Member

@vtjnash vtjnash Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts about using OncePerProcess to make it so we only compute the result once:

Suggested change
struct RelocPath
subpath::String
struct RelocPath
getpath::OncePerProcess{String}
end
RelocPath(path::AbstractString) = let
depot, _ = replace_depot_path_impl(path)
if isnothing(depot)
error("Failed to locate $(path) in any of DEPOT_PATH.")
end
subpath = String(replace(path, depot => ""; count=1))
return RelocPath(OncePerProcess{String}() do
for d in DEPOT_PATH
if isdirpath(d)
d = dirname(d)
end
local path = string(d, subpath)
if ispath(path)
return path
end
end
error("Failed to relocate @depot$(subpath) in any of DEPOT_PATH.")
end)
end
String(r::RelocPath) = r.getpath() # dynamically dispatches since getpath isn 't concrete :/

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this. What if someone has modified Base.DEPOT_PATH during their Julia session?

Comment on lines +3400 to +3446
function show(io::IO, r::RelocPath)
print(io, string("RelocPath(\"@depot", r.subpath, "\")"))
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty inaccurate, since it doesn't support @depot

Suggested change
function show(io::IO, r::RelocPath)
print(io, string("RelocPath(\"@depot", r.subpath, "\")"))
end

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about we allow RelocPath to accept paths that are literally @depot/foo/bar?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, as far as the original show method goes, I think that the meaning of @depot/ is reasonably easy to understand here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty inaccurate, since it doesn't support @depot

But it does behave like a @depot. (it behaves more like include_dependency than include to be precise)

What is inaccurate is that show assumes that nobody tinkered with DEPOT_PATH since construction.

@topolarity
Copy link
Member

Should we just call this DepotPath? I think that might be a clearer name for users

@fatteneder
Copy link
Member Author

Should we just call this DepotPath? I think that might be a clearer name for users

No strong feelings from my side.

@DilumAluthge
Copy link
Member

I also don't feel strongly between RelocPath and DepotPath.

We could also do RelocatablePath if we wanted something a little more unambiguous.

@tecosaur
Copy link
Contributor

tecosaur commented Jan 3, 2025

xref: https://discourse.julialang.org/t/designing-a-paths-julep/124335

Specifically the p"@/... syntax that is intended to create a relocatable project-relative path (currently only in the design doc, not yet implemented).

@DilumAluthge
Copy link
Member

DilumAluthge commented Jan 3, 2025

Specifically the p"@/... syntax that is intended to create a relocatable project-relative path (currently only in the design doc, not yet implemented).

Hmmm, that sounds slightly different that the RelocPath in this PR, right? The RelocPath in this PR is depot-relative (not project-relative).

@DilumAluthge
Copy link
Member

@fatteneder Can you rebase and resolve the merge conflicts?

@tecosaur
Copy link
Contributor

tecosaur commented Jan 3, 2025

Hmmm, that sounds slightly different that the RelocPath in this PR, right? The RelocPath in this PR is depot-relative (not project-relative).

Functionally, there is a difference in the approaches taken here/proposed, but as far as I can tell there's very much overlap in the intended usage. Both fix the linked issue, and in this respect are alternatives.

@DilumAluthge
Copy link
Member

DilumAluthge commented Jan 4, 2025

Hmmm, that sounds slightly different that the RelocPath in this PR, right? The RelocPath in this PR is depot-relative (not project-relative).

Functionally, there is a difference in the approaches taken here/proposed, but as far as I can tell there's very much overlap in the intended usage. Both fix the linked issue, and in this respect are alternatives.

I'm not sure I follow. I think I am probably misunderstanding what you mean by "project-relative path".

I'll try to highlight my confusion. Here's an example use case. Suppose you have two packages, PkgA and PkgB.

In the Git repo for PkgA, there is a file named PkgA/data/stuff.data. This file is some data that I ship in the PkgA repo.

In the Git repo for PkgA, this is PkgA/src/PkgA.jl:

module PkgA

const data = RelocPath(joinpath(dirname(@__DIR__), "data", "stuff.data")) # PkgA/data/myfile.data

function f()
    path = String(data)
    # load from path ...
end

end # module

This is based on the example posted in #55146 (comment) and #56053 (comment).

In the Git repo for PkgB, this is PkgB/Project.toml:

name = "PkgB"

[deps]
PkgA = "..."

In the Git repo for PkgB, this is PkgB/src/PkgB.jl:

module PkgB

import PkgA

function g()
    # ...
    x = PkgA.f()
    * ...
end

end # module

Suppose that I am running on machine_1.

On machine_1, my Julia depot is /home1/user1/foo/bar/mydepot.

On machine_1, I create a new Julia project at /home1/user1/hello/world/myproj/Project.toml:

# myproj/Project.toml

[deps]
PkgB = "..."

Now I build myproj into a custom sysimage. I want to give this custom sysimage to a different user, who will run it on their machine_2. So I put the sysimage in a tarball; I also put my Julia depot into that same tarball, because the user will need it.

So the user extracts the tarball on their machine_2, which obviously won't have the same paths as machine_1. Suppose they end up with the depot extracted to /home2/user2/blah/blah/blah/mydepot, and the project extracted to /home2/user2/various/things/here/myproj/Project.toml.

The user makes sure that /home2/user2/blah/blah/blah/mydepot is in their JULIA_DEPOT_PATH, and then they start Julia with the custom sysimage. And now they want to run the PkgB.g() function, which means that PkgA.f() will end up getting called. So PkgA needs to be able to find the PkgA/data/stuff.data file.

If we hadn't used RelocPath(), the old absolute path of this file on machine_1 (/home1/user1/foo/bar/mydepot/packages/PkgA/slug/data/stuff.data) would have been baked into the custom sysimage.

But we used RelocPath(). So, when PkgA.f() is run, the path = String(data::RelocParth) line will look through the depot path, and it will find that the desired file exists on machine_2 at location /home2/user2/blah/blah/blah/mydepot/packages/PkgA/slug/data/stuff.data.

So, if we use RelocPath() as implemented in this PR (depot-relative), then we find our file, and everything works.

Would the proposed p"@/... syntax have the same behavior, i.e. would it also find the file at the new location (/home2/user2/blah/blah/blah/mydepot/packages/PkgA/slug/data/stuff.data)?

My interpretation of the phrase "project-relative path" was that p"@... would be looking relative to the active project (/home1/user1/hello/world/myproj/Project.toml on the old machine, /home2/user2/various/things/here/myproj/Project.toml on the new machine). In this example, the directory containing the active project has no relationship to the stuff.data file on either the old machine or the new machine, so looking relative to the active project wouldn't yield any fruit. So that's the source of my confusion.

@DilumAluthge
Copy link
Member

DilumAluthge commented Jan 4, 2025

@fatteneder In the current implementation in this PR, in the String(::RelocPath) method, as soon as we find a path where the subpath matches, we accept that path and return.

It occurs to me that, for correctness purposes, we should have more stringent requirements. Instead of only requiring that the subpath matches, I propose that we first check if the subpath matches, and then if the subpath matches, we also require that the checksum matches. If the checksum doesn't match, we reject the path.

This means that at RelocPath() construction time, we have to compute the checksum, and thus the RelocPath struct needs a field for storing the checksum:

struct RelocPath
    subpath::String
    checksum::Union{String, Nothing}
end

When constructing a RelocPath, if the input path is a file, then we just checksum the file. If the input path is a directory, we have to compute the hash of the directory - probably something like this:

artifact_hash = SHA1(GitTools.tree_hash(the_directory))

Which is taken from this section of Pkg.Artifacts.

If the input path is neither a file nor a directory (e.g. it's a symlink), then tbh idk - throw an error maybe? I don't know if that's a real use case that we need to support, at least not in the initial implementation.

I think we should allow opting out of this behavior at two different locations:

  1. At RelocPath construction time, the user can opt-out by doing RelocPath("..."; record_checksum = false). In this case, the newly constructed RelocPath object will have .checksum = nothing stored. For such a RelocPath, whenever String(::RelocPath) is called, we don't compute or verify any checksums, we just accept (and return) the path as soon as we find a path where the subpath matches - in other words, the current behavior of this PR.
    • The default value of the kwarg would be record_checksum = true.
  2. At the String(::RelocPath) callsite. In this case, the user can opt-out by doing String(reloc_path::RelocPath; verify_checksum = false). And if verify_checksum = false, then we don't compute or verify any checksums.
    • The default value of the kwarg would be verify_checksum = true.

What do you think? I feel like this would increase our confidence in the correctness of the file or directory that we return.

@fatteneder
Copy link
Member Author

I think this makes senses.

For include(), include_dependency() we already use the same logic for checksum computation, but with Base only functions: hash = isdir(path) ? _crc32c(join(readdir(path))) : open(_crc32c, path, "r"). I think this also works with symlinks, as they are first opened, although we do not have a test for that.

I would go for RelocPath(; track_content), like include_dependency, and then maybe String(; verify_content).

@fatteneder
Copy link
Member Author

fatteneder commented Jan 4, 2025

We might also want to promote the object that RelocPath(; track_content=true) is referring to to an include_dependency, so that when PkgA/data/stuff.data changes we recompile PkgA to update the cached hash.

@tecosaur
Copy link
Contributor

tecosaur commented Jan 4, 2025

Would the proposed p"@/... syntax have the same behavior, i.e. would it also find the file at the new location (/home2/user2/blah/blah/blah/mydepot/packages/PkgA/slug/data/stuff.data)?

The current idea is that it is relative to the parent project of the code file it occurs in. E.g. in PkgA/src/file.jl instead of doing

RelocPath(joinpath(dirname(@__DIR__), "data", "stuff.data"))

you do

p"@/data/stuf.data"

And the current intent (the Paths proposal is still being written), is that these would behave the same way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:precompilation Precompilation of modules
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Some uses of @__DIR__ should be relocatable.
6 participants