4

path.push() should work as expected on windows verbatim paths by seanyoung · Pul...

 2 years ago
source link: https://github.com/rust-lang/rust/pull/89270
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Copy link

Contributor

seanyoung commented 27 days ago

On Windows, std::fs::canonicalize() returns an so-called UNC path. UNC paths differ with regular paths because:

  • This type of path can much longer than a non-UNC path (32k vs 260 characters).
  • The prefix for a UNC path is Component::Prefix(Prefix::DiskVerbatim(..)))
  • No / is allowed
  • No . is allowed
  • No .. is allowed

Rust has poor handling of such paths. If you join a UNC path with a path with any of the above, then this will not work.

I've implemented a new method fn join_fold() which joins paths and also removes any . and .. from it, and replaces / with \ on Windows. Using this function it is possible to use UNC paths without issue. In addition, this function is useful on Linux too; paths can be appended without having to call canonicalize() to remove the . and ...

This PR needs test cases, which can I add. I hope this will a start of a discussion.

This comment has been hidden.

Copy link

Contributor

the8472 commented 27 days ago

This mixes two orthogonal functions join and an often-proposed but currently not implemented normalize. What's the benefit of putting them into a single method?

  • No . is allowed
  • No .. is allowed

The microsoft docs say that those actually are allowed in paths with the verbatim prefix and will access files named . and .. instead of traversing directories. So interpreting those differently can have the opposite of the intended effect of using the verbatim prefix.

Copy link

Contributor

ChrisDenton commented 27 days ago

I think it would be great to have something that addresses this but there I there should perhaps be a holistic look at Windows path handling in Rust before adding adhoc methods as there are currently numerous open issues with paths. Discussions like this are probably best had on the internals forum or Zulip.

Personally speaking, my thinking at the moment is that push should "just work" for verbatim paths, with the path being pushed normalized according to platform conventions as this is what's wanted 99.999999% of the time (approx). There might have to be a push_verbatim for pushing paths that you want to include ., .. or / components. Though it would be very rarely used because Windows filesystems do not allow these as file names (and indeed they would screw up non-verbatim paths if they were allowed).

I guess path.push(normalize(subpath)) would also work but I'd prefer the common case to be simpler to do than the exceedingly rare case. Otherwise it's too easy to do the wrong thing unless you have knowledge of the arcane corners of Windows.

This comment has been hidden.

This comment has been hidden.

Copy link

Contributor

Author

seanyoung commented 27 days ago

I think it would be great to have something that addresses this but there I there should perhaps be a holistic look at Windows path handling in Rust before adding adhoc methods as there are currently numerous open issues with paths. Discussions like this are probably best had on the internals forum or Zulip.

That is probably a good idea. In order to facilitate that discussion, I'm updating this PR.

Personally speaking, my thinking at the moment is that push should "just work" for verbatim paths, with the path being pushed normalized according to platform conventions as this is what's wanted 99.999999% of the time (approx). There might have to be a push_verbatim for pushing paths that you want to include ., .. or / components. Though it would be very rarely used because Windows filesystems do not allow these as file names (and indeed they would screw up non-verbatim paths if they were allowed).

This is a good point and I totally agree this. I've updated this PR to do exactly that.

I guess path.push(normalize(subpath)) would also work but I'd prefer the common case to be simpler to do than the exceedingly rare case. Otherwise it's too easy to do the wrong thing unless you have knowledge of the arcane corners of Windows.

path.push(normalize(subpath)) would not work. If subpath is ../foo then path.push() still needs to have special handling for verbatim paths.

Copy link

Contributor

Author

seanyoung commented 27 days ago

This mixes two orthogonal functions join and an often-proposed but currently not implemented normalize. What's the benefit of putting them into a single method?

normalize() would not solve the issue. I have changed the PR so now join() works with verbatim paths.

  • No . is allowed
  • No .. is allowed

The microsoft docs say that those actually are allowed in paths with the verbatim prefix and will access files named . and .. instead of traversing directories. So interpreting those differently can have the opposite of the intended effect of using the verbatim prefix.

I don't think this is true. I've tried creating . and .. files using unc paths and it does not work. Besides, even the folks at microsoft are not insane. This would be a terrible idea.

Copy link

Contributor

the8472 commented 27 days ago

normalize() would not solve the issue.

Can you explain how it wouldn't solve it?

I don't think this is true. I've tried creating . and .. files using unc paths and it does not work.

That might depend on the filesystem. I think it's similar on linux. On most filesystems . point will do the expected thing but with FUSE you can create funky directories where it behaves differently or doesn't exist at all.

Copy link

Contributor

Author

seanyoung commented 27 days ago

normalize() would not solve the issue.

Can you explain how it wouldn't solve it?

Considerpath.join(normalize("../foo")):

  • normalize() would return ../foo or ..\foo on Windows
  • path.join() would still have to process .. and remove the last directory from self before appending foo.

I don't think this is true. I've tried creating . and .. files using unc paths and it does not work.

That might depend on the filesystem. I think it's similar on linux. On most filesystems . point will do the expected thing but with FUSE you can create funky directories where it behaves differently or doesn't exist at all.

Your original point was about Windows, so I tried this on Windows:

use std::fs::File;
use std::io::Write;

fn main() {
    let mut file =
        File::create(r"\\?\..").expect("create dot file");
    file.write_all(b"Hello, world!").expect("write dot file");
}

(I tried ., .. with relative and absolute paths, does not work).

On Linux . and .. are special names which are handled in kernel space. You cannot create . or .. files, it does not depend on the filesystem, fuse or otherwise. See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/namei.c#n2215

(BTW I'm a kernel maintainer)

Copy link

Contributor

the8472 commented 27 days ago

normalize() would not solve the issue.

Can you explain how it wouldn't solve it?

Considerpath.join(normalize("../foo")):

  • normalize() would return ../foo or ..\foo on Windows
  • path.join() would still have to process .. and remove the last directory from self before appending foo.

Then why not normalize(path.join("../foo"))?

On Linux . and .. are special names which are handled in kernel space. You cannot create . or .. files, it does not depend on the filesystem, fuse or otherwise.

Maybe it's limited to dir entry enumeration, but I do recall getting surprising results when using FUSE

use std::fs::File;
use std::io::Write;

fn main() {
    let mut file =
        File::create(r"\\?\..").expect("create dot file");
    file.write_all(b"Hello, world!").expect("write dot file");
}

(I tried ., .. with relative and absolute paths, does not work).

But did you try with different filesystems? E.g. FAT? I don't have a windows machine here, otherwise I'd poke around myself.

I assume there is a reason why the documentation says:

Because it turns off automatic expansion of the path string, the "\?" prefix also allows the use of ".." and "." in the path names, which can be useful if you are attempting to perform operations on a file with these otherwise reserved relative path specifiers as part of the fully qualified path.

Copy link

Contributor

Author

seanyoung commented 27 days ago

Considerpath.join(normalize("../foo")):

  • normalize() would return ../foo or ..\foo on Windows
  • path.join() would still have to process .. and remove the last directory from self before appending foo.

Then why not normalize(path.join("../foo"))?

Because I assumed that path.join("../foo")) returns garbage, and you're using normalize to clean up the mess. I'm not so sure now, see below.

On Linux . and .. are special names which are handled in kernel space. You cannot create . or .. files, it does not depend on the filesystem, fuse or otherwise.

Maybe it's limited to dir entry enumeration, but I do recall getting surprising results when using FUSE

path walks are done for file creation too. It would be interesting to hear more about "surprising results".

use std::fs::File;
use std::io::Write;

fn main() {
    let mut file =
        File::create(r"\\?\..").expect("create dot file");
    file.write_all(b"Hello, world!").expect("write dot file");
}

(I tried ., .. with relative and absolute paths, does not work).

But did you try with different filesystems? E.g. FAT? I don't have a windows machine here, otherwise I'd poke around myself.

I assume there is a reason why the documentation says:

Because it turns off automatic expansion of the path string, the "?" prefix also allows the use of ".." and "." in the path names, which can be useful if you are attempting to perform operations on a file with these otherwise reserved relative path specifiers as part of the fully qualified path.

Now this is interesting - and I didn't expect this. With FAT and exFAT you can create files called . and ... They're only accessible using UNC paths, but they show up with dir. Normally dir does not show . or ...

If you then mount the file system on Linux, it cannot deal with the . or .. files. This is must be a bug in the linux exfat driver. It actually enumerates them as directories.

So now the question is, if verbatim paths permit .. and . on Windows and (ex)fat file systems, then what do we do? What does PathBuf::from("\\?\bar").join("../foo") mean?

  • \\?\bar\../foo
  • \\?\foo

I can see the appeal of the fn normalize() as you proposed.

Copy link

Contributor

ChrisDenton commented 27 days ago

Btw, I wrote a thing about NT and Win32 paths, in case it helps.

The short version is NT paths can include anything except \ (including NUL). What I didn't mention there is that the NT kernel does only the minimum to resolve the relevant device, the rest of the path it simply passes to that device to resolve. The device can do anything with that sub-path.

So these NT paths are essentially impossible to handle sensibly with Rust's Path/PathBuf without special knowledge of the device being used. Maybe the device uses $ as the path separator? In that case push is already broken. Maybe . is a filename? In that case components is broken. Etc, etc.

I think Path methods should assume a "normal" filesystem path as the default and work as expected without the programmer needing special knowledge of the arcane workings of the NT kernel. Special methods could be added to address the special cases but understanding them should not be necessary for writing cross-platform software.

Copy link

Contributor

Author

seanyoung commented 26 days ago

Btw, I wrote a thing about NT and Win32 paths, in case it helps.

The short version is NT paths can include anything except \ (including NUL). What I didn't mention there is that the NT kernel does only the minimum to resolve the relevant device, the rest of the path it simply passes to that device to resolve. The device can do anything with that sub-path.

Very interesting.

So these NT paths are essentially impossible to handle sensibly with Rust's Path/PathBuf without special knowledge of the device being used. Maybe the device uses $ as the path separator? In that case push is already broken. Maybe . is a filename? In that case components is broken. Etc, etc.

I think Path methods should assume a "normal" filesystem path as the default and work as expected without the programmer needing special knowledge of the arcane workings of the NT kernel. Special methods could be added to address the special cases but understanding them should not be necessary for writing cross-platform software.

I concur completely. I would much rather see path join handling of . and .. do what you expect, and do what the rest of the path handling does. The fact that some windows filesystem drivers can do files called . and .. is an oddity which rust Path/PathBuf can do without.

If anyone wants to create a path like \\?\D:\foo\.., then that is still possible. I don't think it would surprise anyone that a Path/PathBuf join thinks this is a walk to the parent of foo.

seanyoung

changed the title Add variant of std::path::Path join() method which removes .. and .

path.join() should work as expected on windows verbatim paths

24 days ago

seanyoung

changed the title path.join() should work as expected on windows verbatim paths

path.push() should work as expected on windows verbatim paths

24 days ago

This comment has been hidden.

Copy link

Contributor

the8472 commented 24 days ago

r? rust-lang/libs

Copy link

Contributor

Author

seanyoung commented 22 days ago

@m-ou-se I've made some last changes after a lot more testing. Ready for review

Copy link

Member

m-ou-se commented 20 days ago

Thanks!

@bors r+

Copy link

Contributor

bors commented 20 days ago

pushpin Commit fa4072f has been approved by m-ou-se

bors

merged commit 7aa9ce5 into

rust-lang:master 19 days ago

10 checks passed

rustbot

added this to the 1.57.0 milestone

19 days ago

seanyoung

deleted the join_fold branch

19 days ago

Copy link

Contributor

Eh2406 commented 8 days ago

I am aware of these issues but not knowledgeable. Thank you for the good discussion, both here and on zulip! I think I am convinced that this is a positive change.

But, I am very surprised about the process here! Given this is a subtle change to windows behavior I would have expected knowledgeable parties to be cc'ed, like @ehuss or @retep998 maybe @kennykerr or @rylev.

Given that this is a (invisible to crater) breaking change to a stable API I would expect some team to need to do a FCP, maybe @rust-lang/libs-api. And for there to be a discussion of how we are going to let people know about the breaking change.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK