-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(package): check dirtiness of symlinks source files #14981
Conversation
src/cargo/ops/cargo_package.rs
Outdated
// TODO: Should we warn users if there are like thousands of symlinks, | ||
// which may hurt the performance? | ||
if repo.status_file(&rel_path)? != git2::Status::CURRENT { | ||
dirty_symlinks.insert(workdir.join(rel_path)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could potentially cache git status for each repo then. That leaves to future improvement if this one has got merged.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #14985.
### What does this PR try to resolve? Do nothing but move code around * `cargo_package.rs` -> `cargo_package/mod.rs` * Extract vcs info helpers to `cargo_package/vcs.rs` * Extract verification helpers to `cargo_package/verify.rs` Doing so because I realized how big `cargo_package.rs` has grown. Like, the vcs helpers will continue growing with #14981 and potentially other optimizations. ### How should we test and review this PR? All tests pass.
767408a
to
451c06d
Compare
It may not be worthy for the majority. Single file `git status` is generally fast. On some slow file systems or lower-end machines, skipping file system access might be helpful. However, slow file system access usually happen on Window. And we'll only have large amount of `git status` when rust-lang#14981 merges and the repo contains lots of symlinks. Given symlink handling story is crazy in real world Windows, I doubt anybody will hit the performance issue without this.
This comment was marked as resolved.
This comment was marked as resolved.
451c06d
to
19acb93
Compare
This has been fixed. |
19acb93
to
6baf1f5
Compare
73a57f4
to
95bb4ed
Compare
### What does this PR try to resolve? `cargo package` will warn users when git `core.symlinks` is `false` and some symlinks were checked out as plain files during packaging. Git config [`core.symlinks`] defaults to true when unset. In git-for-windows (and git as well), the config should be set to false explicitly when the repo was created, if symlink support wasn't detected [^1]. We assume the config was always set at creation time and never changed. So, if it is true, we don't bother users with any warning. [^1]: https://github.com/git-for-windows/git/blob/f1241afcc7956918d5da33ef74abd9cbba369247/setup.c#L2394-L2403 [`core.symlinks`]: https://git-scm.com/docs/git-config#Documentation/git-config.txt-coresymlinks ### How should we test and review this PR? CI passes. This shares two commits 42dc4ef and c8c8223 with #14981. I didn't commit to fix all symlink issues all at once. This PR demonstrates how we could leverage metadata in `PathEntry`. Maybe later we can really follow plain-text symlinks and resolve the issue. ### Additional information cc #5664
So `path` and `base` are able to accept different types
`abs_path` and `workdir.join(rel_path)` are the same at that point.
This comment has been minimized.
This comment has been minimized.
This is helpful for VCS status check. Paths emitted by PathSource are always under package root, We lose the track of file type info of paths under symlink dirs, so we need this extra bit of information.
This show that a regular file under a symlink directory is also not tarcked by the current vcs dirtiness check.
This adds a special case for checking source files are symlinks and have been modified when under a VCS control This is required because those paths may link to a file outside the current package root, but still under the git workdir, affecting the final packaged `.crate` file. This may have potential performance issue. If a package contains thousands of symlinks, Cargo will fire `git status` for each of them.
metdata path fields may point to a dirty symlilnk and cause duplicate report. This commit combines `dirty_metadata_paths` and `dirty_symlinks` into one so is able to de-duplicate them.
95bb4ed
to
24dd205
Compare
It may not be worthy for the majority. Single file `git status` is generally fast. On some slow file systems or lower-end machines, skipping file system access might be helpful. However, slow file system access usually happen on Window. And we'll only have large amount of `git status` when rust-lang#14981 merges and the repo contains lots of symlinks. Given symlink handling story is crazy in real world Windows, I doubt anybody will hit the performance issue without this.
Update cargo 9 commits in d73d2caf9e41a39daf2a8d6ce60ec80bf354d2a7..fd784878cfa843e3e29a6654ecf564c62fae6735 2024-12-31 20:51:21 +0000 to 2025-01-03 20:06:26 +0000 - chore: bump gix-lock to remove thiserror@1 from `cargo` (rust-lang/cargo#15012) - refactor(manifest): Clean up field -> env var handling (rust-lang/cargo#15008) - chore(deps): update rust crate thiserror to v2 (rust-lang/cargo#14998) - test(git): Clean up shallow fetch tests (rust-lang/cargo#15002) - fix(schema): Correct and update the JSON Schema (rust-lang/cargo#15000) - chore(deps): update rust crate itertools to 0.14.0 (rust-lang/cargo#14996) - fix: env table config can't trigger rebuild with `rerun-if-env-changed`. (rust-lang/cargo#14756) - chore(deps): update alpine docker tag to v3.21 (rust-lang/cargo#14995) - fix(package): check dirtiness of symlinks source files (rust-lang/cargo#14981)
What does this PR try to resolve?
This adds a special case for checking source files are symlinks
and have been modified when under a VCS control
This is required because those paths may link to a file outside the
current package root, but still under the git workdir, affecting the
final packaged
.crate
file.How should we test and review this PR?
Pretty similar to #14966, as a part of #14967.
This may have potential performance issue. If a package contains thousands of symlinks, Cargo will fire
git status
for each of them. Not sure if we want to do anything proactively now.The introduction of the
PathEntry
struct gives us more room for storing file metadata to satisfiy use cases in the future. For instances,cargo package --list
, for example JSON output modecargo package --list
#11666include
andexclude
information to metadata #13331cargo package --list
is not good for workspaces #13953