Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

File path support for Foundation #5094

Open
cmcgee1024 opened this issue Oct 2, 2024 · 5 comments
Open

File path support for Foundation #5094

cmcgee1024 opened this issue Oct 2, 2024 · 5 comments
Assignees

Comments

@cmcgee1024
Copy link
Member

It's common for languages to have as part of their standard library a way to construct file path data structures that can then be used to perform certain queries (e.g. isAbsolute(). fileExt(), baseName(), dirName()) on them, and perform certain path arithmetic operations, such as appending paths and/or files to the end, or resolving relative paths against absolute ones. With some care an app writer can craft something that runs without much modification on a variety of platforms, including Linux, macOS, and Windows.

For example, Python has pathlib, Go has filepath, and Java has Path. Some of these take different stances on certain issues like being able to use paths from a platform other than the current host platform. But, for the most part these are the cornerstone of many libraries, and apps written in those languages. It also helps the standard app to be more platform independent, again, with some care from developers.

In the Swift world there are at least these independent file path API's that are available to developer:

  • Foundation URI
    • It's a bit too general, which causes headaches
    • Code often need to take care to ensure that they check for file:/// URI's, and only construct this type
    • Certain functions aren't available for certain types of URI's
    • Getting the right kind of platform-specific absolute paths for platforms like Windows doesn't work well, encouraging extensions and unsafe calls to get them right
  • System FilePath
    • Tied to the current platform, which may or may not be POSIX
    • Difficult to code in a platform-neutral way (see the Errnos)
    • Issues around encoding
  • Foundation String
    • Some of the foundation API's in FileManager will accept arbitrary string paths
    • No API's for doing path arithmetic operations on the strings

On top of this, various Swift libraries and apps are writing their own Path structs, wrappers, and extensions to these different API's, which also perpetuates this even more because of mismatches between libraries. One uses Foundation URI, another uses FilePath, String, or some custom Path. A number of projects are hitting problems when trying to port things to Windows. Also, they are re-learning from some of the mistakes that a common API could shield them.

This enhancement would help to unify much of Swift under a single file path API for the benefit of everyone. Hopefully, porting efforts will become easier, same with integration.

@cmcgee1024
Copy link
Member Author

FilePath from System is arguably very close to what's needed except it's in the wrong module, System, not Foundation where the platform neutral functions can live. As mentioned in the description there are some dependencies on POSIX Errnos that are throwable. Also, some Windows-specific issues with encodings / codable.

Perhaps this can serve as the starting point for a common API?

@parkera
Copy link
Contributor

parkera commented Oct 15, 2024

Thanks for the detailed bug report @cmcgee1024

@milseman
Copy link
Member

It's common for languages to have as part of their standard library a way to construct file path data structures that can then be used to perform certain queries (e.g. isAbsolute(). fileExt(), baseName(), dirName()) on them, and perform certain path arithmetic operations, such as appending paths and/or files to the end, or resolving relative paths against absolute ones.

This is what System's FilePath is for. It's API for syntactic manipulation was designed to be a cross-platform Swifty superset of what is found in Python, C#, Rust, C++, etc.

FilePath has a ComponentView which is a RangeReplaceableCollection of path components that provides algebraic semantics for manipulating the components of a path. FilePath formally separates the root from this algebraic collection, which is necessary for sensible inserts, etc.

See https://gist.github.com/milseman/294bd494d6911c65b80fccff5873b295, which includes rationale and can be an easier way to view the API in whole than browsing documentation.

Tied to the current platform, which may or may not be POSIX

FilePath represents a native path for the target, so it is a Windows path when targeting Windows and a Unix path when targeting Unix. There is a place for such a type, e.g. an argument to a syscall would take a native path.

We have the implementation machinery to support cross-platform paths, so it's just an API design question. I think it would be good to add explicit UnixFilePath and WindowsFilePath types with failable conversion to/from the target platform's FilePath.

Difficult to code in a platform-neutral way (see the Errnos)

FilePath's API is syntactic, meaning there's no syscalls and no Errnos. The operations you mentioned in the intro paragraph are syntactic operations and these behave in a platform-neutral way.

A separate question is what kinds of API we should have for interacting with the file system, and what that would look like for a low-level platform-specific layer and a higher level platform-agnostic layer.

Issues around encoding

What are your issues?

Interpreting the content of a file path can be file system specific. Windows paths are UCS2 and allows unpaired surrogates. Linux are bag-of-bytes, and Darwin is typically UTF-8 (often canonicalized in NFD)

FilePath on Linux/Darwin is a nul-terminated bag of UInt8 and on Windows is a nul-termianted bag of UInt16. When converting to a String (e.g. for printing out), it will replace invalid encoded contents with U+FFFD and this is defined in the documentation: https://developer.apple.com/documentation/system/filepath/description
Similarly, String(decoding: FilePath) will perform the error correction (just like String(decoding: bytes, as: UTF8.self) performs the error correction).

That is, FilePath will not enforce modern Unicode on the path, it only does that when converting to a Unicode String, which is explicitly failable (via the validating: initializer) or error-correcting (via the decoding: initializer). This is how String's initializers over arbitrary data work.

Also, they are re-learning from some of the mistakes that a common API could shield them... FilePath from System is arguably very close to what's needed except it's in the wrong module, System, not Foundation where the platform neutral functions can live.

FilePath is that common API that was very carefully designed to avoid these mistakes and take the best of all the languages surveyed.

We can talk about having Foundation re-export FilePath for its syntactic API (whether or not it re-exports syscalls in general, though note that it already re-exports Darwin/GlibC).

@milseman
Copy link
Member

IMO the ideal place for FilePath's (and Windows/Unix variants) syntactic operations would be the stdlib. It's within the stdlib's mandate and would make it the common API. (Also, no need for Foundation to re-export it and no need to pull in all of System just for it).

@al45tair
Copy link
Contributor

Issues around encoding

I think this might be a reference to the Codable conformance, which is honestly a bit of a disaster area. (The System.FilePath type cannot safely be Codable, because the encoding differs on a per-platform basis.)

Putative WindowsFilePath and POSIXFilePath types, of course, could be.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants