path.push() should work as expected on windows verbatim paths by seanyoung · Pul...
source link: https://github.com/rust-lang/rust/pull/89270
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
On Windows, std::fs::canonicalize() returns an so-called UNC path. UNC paths differ with regular paths because:
- This type of path can much longer than a non-UNC path (32k vs 260 characters).
- The prefix for a UNC path is
Component::Prefix(Prefix::DiskVerbatim(..)))
- No
/
is allowed - No
.
is allowed - No
..
is allowed
Rust has poor handling of such paths. If you join a UNC path with a path with any of the above, then this will not work.
I've implemented a new method fn join_fold()
which joins paths and also removes any .
and ..
from it, and replaces /
with \
on Windows. Using this function it is possible to use UNC paths without issue. In addition, this function is useful on Linux too; paths can be appended without having to call canonicalize()
to remove the .
and ..
.
This PR needs test cases, which can I add. I hope this will a start of a discussion.
This comment has been hidden.
This mixes two orthogonal functions join
and an often-proposed but currently not implemented normalize
. What's the benefit of putting them into a single method?
- No
.
is allowed- No
..
is allowed
The microsoft docs say that those actually are allowed in paths with the verbatim prefix and will access files named .
and ..
instead of traversing directories. So interpreting those differently can have the opposite of the intended effect of using the verbatim prefix.
I think it would be great to have something that addresses this but there I there should perhaps be a holistic look at Windows path handling in Rust before adding adhoc methods as there are currently numerous open issues with paths. Discussions like this are probably best had on the internals forum or Zulip.
Personally speaking, my thinking at the moment is that push
should "just work" for verbatim paths, with the path being pushed normalized according to platform conventions as this is what's wanted 99.999999% of the time (approx). There might have to be a push_verbatim
for pushing paths that you want to include .
, ..
or /
components. Though it would be very rarely used because Windows filesystems do not allow these as file names (and indeed they would screw up non-verbatim paths if they were allowed).
I guess path.push(normalize(subpath))
would also work but I'd prefer the common case to be simpler to do than the exceedingly rare case. Otherwise it's too easy to do the wrong thing unless you have knowledge of the arcane corners of Windows.
This comment has been hidden.
This comment has been hidden.
I think it would be great to have something that addresses this but there I there should perhaps be a holistic look at Windows path handling in Rust before adding adhoc methods as there are currently numerous open issues with paths. Discussions like this are probably best had on the internals forum or Zulip.
That is probably a good idea. In order to facilitate that discussion, I'm updating this PR.
Personally speaking, my thinking at the moment is that
push
should "just work" for verbatim paths, with the path being pushed normalized according to platform conventions as this is what's wanted 99.999999% of the time (approx). There might have to be apush_verbatim
for pushing paths that you want to include.
,..
or/
components. Though it would be very rarely used because Windows filesystems do not allow these as file names (and indeed they would screw up non-verbatim paths if they were allowed).
This is a good point and I totally agree this. I've updated this PR to do exactly that.
I guess
path.push(normalize(subpath))
would also work but I'd prefer the common case to be simpler to do than the exceedingly rare case. Otherwise it's too easy to do the wrong thing unless you have knowledge of the arcane corners of Windows.
path.push(normalize(subpath))
would not work. If subpath is ../foo
then path.push()
still needs to have special handling for verbatim paths.
This mixes two orthogonal functions
join
and an often-proposed but currently not implementednormalize
. What's the benefit of putting them into a single method?
normalize()
would not solve the issue. I have changed the PR so now join()
works with verbatim paths.
- No
.
is allowed- No
..
is allowedThe microsoft docs say that those actually are allowed in paths with the verbatim prefix and will access files named
.
and..
instead of traversing directories. So interpreting those differently can have the opposite of the intended effect of using the verbatim prefix.
I don't think this is true. I've tried creating .
and ..
files using unc paths and it does not work. Besides, even the folks at microsoft are not insane. This would be a terrible idea.
normalize() would not solve the issue.
Can you explain how it wouldn't solve it?
I don't think this is true. I've tried creating . and .. files using unc paths and it does not work.
That might depend on the filesystem. I think it's similar on linux. On most filesystems .
point will do the expected thing but with FUSE you can create funky directories where it behaves differently or doesn't exist at all.
normalize() would not solve the issue.
Can you explain how it wouldn't solve it?
Considerpath.join(normalize("../foo"))
:
normalize()
would return../foo
or..\foo
on Windowspath.join()
would still have to process..
and remove the last directory from self before appendingfoo
.
I don't think this is true. I've tried creating . and .. files using unc paths and it does not work.
That might depend on the filesystem. I think it's similar on linux. On most filesystems
.
point will do the expected thing but with FUSE you can create funky directories where it behaves differently or doesn't exist at all.
Your original point was about Windows, so I tried this on Windows:
use std::fs::File;
use std::io::Write;
fn main() {
let mut file =
File::create(r"\\?\..").expect("create dot file");
file.write_all(b"Hello, world!").expect("write dot file");
}
(I tried .
, ..
with relative and absolute paths, does not work).
On Linux .
and ..
are special names which are handled in kernel space. You cannot create .
or ..
files, it does not depend on the filesystem, fuse or otherwise. See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/namei.c#n2215
(BTW I'm a kernel maintainer)
normalize() would not solve the issue.
Can you explain how it wouldn't solve it?
Consider
path.join(normalize("../foo"))
:
normalize()
would return../foo
or..\foo
on Windowspath.join()
would still have to process..
and remove the last directory from self before appendingfoo
.
Then why not normalize(path.join("../foo"))
?
On Linux . and .. are special names which are handled in kernel space. You cannot create . or .. files, it does not depend on the filesystem, fuse or otherwise.
Maybe it's limited to dir entry enumeration, but I do recall getting surprising results when using FUSE
use std::fs::File; use std::io::Write; fn main() { let mut file = File::create(r"\\?\..").expect("create dot file"); file.write_all(b"Hello, world!").expect("write dot file"); }
(I tried
.
,..
with relative and absolute paths, does not work).
But did you try with different filesystems? E.g. FAT? I don't have a windows machine here, otherwise I'd poke around myself.
I assume there is a reason why the documentation says:
Because it turns off automatic expansion of the path string, the "\?" prefix also allows the use of ".." and "." in the path names, which can be useful if you are attempting to perform operations on a file with these otherwise reserved relative path specifiers as part of the fully qualified path.
Consider
path.join(normalize("../foo"))
:
normalize()
would return../foo
or..\foo
on Windowspath.join()
would still have to process..
and remove the last directory from self before appendingfoo
.Then why not
normalize(path.join("../foo"))
?
Because I assumed that path.join("../foo"))
returns garbage, and you're using normalize to clean up the mess. I'm not so sure now, see below.
On Linux . and .. are special names which are handled in kernel space. You cannot create . or .. files, it does not depend on the filesystem, fuse or otherwise.
Maybe it's limited to dir entry enumeration, but I do recall getting surprising results when using FUSE
path walks are done for file creation too. It would be interesting to hear more about "surprising results".
use std::fs::File; use std::io::Write; fn main() { let mut file = File::create(r"\\?\..").expect("create dot file"); file.write_all(b"Hello, world!").expect("write dot file"); }
(I tried
.
,..
with relative and absolute paths, does not work).But did you try with different filesystems? E.g. FAT? I don't have a windows machine here, otherwise I'd poke around myself.
I assume there is a reason why the documentation says:
Because it turns off automatic expansion of the path string, the "?" prefix also allows the use of ".." and "." in the path names, which can be useful if you are attempting to perform operations on a file with these otherwise reserved relative path specifiers as part of the fully qualified path.
Now this is interesting - and I didn't expect this. With FAT and exFAT you can create files called .
and ..
. They're only accessible using UNC paths, but they show up with dir. Normally dir does not show .
or ..
.
If you then mount the file system on Linux, it cannot deal with the .
or ..
files. This is must be a bug in the linux exfat driver. It actually enumerates them as directories.
So now the question is, if verbatim paths permit ..
and .
on Windows and (ex)fat file systems, then what do we do? What does PathBuf::from("\\?\bar").join("../foo")
mean?
\\?\bar\../foo
\\?\foo
I can see the appeal of the fn normalize()
as you proposed.
Btw, I wrote a thing about NT and Win32 paths, in case it helps.
The short version is NT paths can include anything except \
(including NUL
). What I didn't mention there is that the NT kernel does only the minimum to resolve the relevant device, the rest of the path it simply passes to that device to resolve. The device can do anything with that sub-path.
So these NT paths are essentially impossible to handle sensibly with Rust's Path
/PathBuf
without special knowledge of the device being used. Maybe the device uses $
as the path separator? In that case push
is already broken. Maybe .
is a filename? In that case components
is broken. Etc, etc.
I think Path
methods should assume a "normal" filesystem path as the default and work as expected without the programmer needing special knowledge of the arcane workings of the NT kernel. Special methods could be added to address the special cases but understanding them should not be necessary for writing cross-platform software.
Btw, I wrote a thing about NT and Win32 paths, in case it helps.
The short version is NT paths can include anything except
\
(includingNUL
). What I didn't mention there is that the NT kernel does only the minimum to resolve the relevant device, the rest of the path it simply passes to that device to resolve. The device can do anything with that sub-path.
Very interesting.
So these NT paths are essentially impossible to handle sensibly with Rust's
Path
/PathBuf
without special knowledge of the device being used. Maybe the device uses$
as the path separator? In that casepush
is already broken. Maybe.
is a filename? In that casecomponents
is broken. Etc, etc.I think
Path
methods should assume a "normal" filesystem path as the default and work as expected without the programmer needing special knowledge of the arcane workings of the NT kernel. Special methods could be added to address the special cases but understanding them should not be necessary for writing cross-platform software.
I concur completely. I would much rather see path join handling of .
and ..
do what you expect, and do what the rest of the path handling does. The fact that some windows filesystem drivers can do files called .
and ..
is an oddity which rust Path
/PathBuf
can do without.
If anyone wants to create a path like \\?\D:\foo\..
, then that is still possible. I don't think it would surprise anyone that a Path
/PathBuf
join thinks this is a walk to the parent of foo
.
changed the title Add variant of std::path::Path join() method which removes .. and .
path.join() should work as expected on windows verbatim paths
changed the title path.join() should work as expected on windows verbatim paths
path.push() should work as expected on windows verbatim paths
This comment has been hidden.
r? rust-lang/libs
@m-ou-se I've made some last changes after a lot more testing. Ready for review
Thanks!
@bors r+
Commit fa4072f has been approved by m-ou-se
I am aware of these issues but not knowledgeable. Thank you for the good discussion, both here and on zulip! I think I am convinced that this is a positive change.
But, I am very surprised about the process here! Given this is a subtle change to windows behavior I would have expected knowledgeable parties to be cc'ed, like @ehuss or @retep998 maybe @kennykerr or @rylev.
Given that this is a (invisible to crater) breaking change to a stable API I would expect some team to need to do a FCP, maybe @rust-lang/libs-api. And for there to be a discussion of how we are going to let people know about the breaking change.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK