When using `process::Command` on Windows, environment variable names must be cas...
source link: https://github.com/rust-lang/rust/pull/85270
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
When using Command
to set the environment variables, the key should be compared as uppercase Unicode but when set it should preserve the original case.
Fixes #85242
(rust-highfive has picked a reviewer for you, use r? to override)
I've moved the test to the right place. Sorry about that.
changed the title
Windows enviroment variables on process::Comamnd
should be case-preserving but comparing must be case-sensitive
Windows enviroment variables on process::Comamnd
should be case-preserving but comparing must be case-insensitive
This comment has been hidden.
changed the title
Windows enviroment variables on process::Comamnd
should be case-preserving but comparing must be case-insensitive
When using process::Command
on Windows, environment variable names must be case-preserving but case-insensitive
fn iter_uppercase<T: Iterator<Item = u16>>(iter: T) -> impl Iterator<Item = u32> {
char::decode_utf16(iter).flat_map(|cp| match cp {
Err(e) => CodeUnits::Unit(Some(e.unpaired_surrogate())),
Ok(cp) => CodeUnits::Iter(cp.to_uppercase()),
CDirkx on May 14 •
Contributor
I checked and to_uppercase
is not always correct: 'ß'.to_uppercase() == "SS"
, but the following fails
set_var("ß", "something"); var("SS").unwrap() // panic: Err<NotPresent>
I can't find anything about in official documentation, but from tests I did I suspect Windows uses simple case folding (for context see Unicode Technical Report #21 Case Mappings Section 1.3, simple case folding is the unicode case folding algorithm, but restricted to code points that don't expand under upper/lower casing like ß
). I also checked if any normalization or locale-dependent behavior was happening, but that doesn't appear to be the case: é
and é
(e + acute accent combining character) are different keys and nothing changed around i
/I
on the Turkish locale (https://en.wikipedia.org/wiki/Dotted_and_dotless_I).
Not sure what the best way to use case folding in std would be (if that is even what Windows is indeed using). There are crates like unicode-rs/rust-caseless implementing case folding, but I couldn't find any that also implements the simple variant.
Edit: Windows does not use Unicode case folding, but instead has a different uppercase implementation: see the comment below.
Edit2: It does appear to be based on reverse unicode folding, but with some unique particularities, see https://github.com/ChrisDenton/Windows-case
ChrisDenton on May 14
Author
Contributor
Hm yeah, that is a problem. I think the only way to resolve this properly is to use the OS's own function for this. So in the next commit I've replaced my implementation with a Windows API call and expanded the testing slightly.
CDirkx on May 14
Contributor
I looked through the docs some more:
File paths and environment variables should be compared using StringComparison.OrdinalIgnoreCase
[1]. Comparisons made using OrdinalIgnoreCase
are behaviorally the composition of two calls: calling ToUpperInvariant
on both string arguments, and doing a comparison using Ordinal
[2]. A comparison using Ordinal
is just a comparison based on the numerical value of each utf16 code unit [3].
So the original approach was conceptually correct, but Windows apparently handles converting to uppercase differently than rust (I haven't yet found anywhere how ToUpperInvariant
is actually implemented ). I agree with the approach to instead just use the Windows API.
(As an aside, with the relevant Windows API added a future possibility for std
would be to expose case insensitive functionality with a std::os
extension trait, for example for correctly comparing paths case insensitively.)
[1] https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings#choosing-a-stringcomparison-member-for-your-method-call
[2] https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings#stringtoupper-and-stringtolower
[3] https://docs.microsoft.com/en-us/dotnet/api/system.stringcomparison?view=net-5.0#System_StringComparison_Ordinal
m-ou-se 4 days ago
Member
Can you add this conclusion, why CompareStringOrdinal is used, in a comment in the code?
ChrisDenton 3 days ago
Author
Contributor
@m-ou-se I've added a comment (based on CDirkx's) to briefly explain how comparing environment keys works and thus why CompareStringOrdinal
is used.
@rustbot ping windows
It sounds like there's some differences between Rust's case-conversion and the one used by Windows - see this comment #85270 (comment) -- I'm wondering if some windows folks can weigh in on the right thing for us to do here for comparison.
I think it sounds like @ChrisDenton this PR is actually solving two different problems right now, right? We're currently exposing the case-converted (i.e., to_ascii_uppercase) environment variable names through std's APIs, which is loosely the decoupling intended by the first commit here. Then, the second question is whether to_ascii_uppercase is actually the right function to call (sounds like no), and what that function should be.
Hey Windows Group! This bug has been identified as a good "Windows candidate".
In case it's useful, here are some instructions for tackling these sorts of
bugs. Maybe take a look?
Thanks! <3
cc @arlosi @danielframpton @gdr-at-ms @kennykerr @luqmana @lzybkr @nico-abram @retep998 @rylev @sivadeilra @wesleywiser
To sum up where we are at now, there were two problems with the original code, before this PR:
- It ASCII uppercased environment variable keys. It should have left them as-is.
- The uppercasing was (possibly) done so that looking up environment variables by key would be case-insensitive. The problem with this is that Windows uses a variant of Unicode casing rules to do key comparisons, not just ASCII uppercasing.
My first attempt to fix this was to remove the original uppercasing and use Rust's Unicode functions to compare (and only to compare) the strings. As CDirkx has shown, this was wrong because Windows uses what it calls "ordinal" comparisons in this context. This ordinal comparison is documented as testing for "binary equality, not linguistic equality" though I'm a bit fuzzy on what that means for casing.
So now I'm using the Windows API function CompareStringOrdinal
to do the comparison, which appears to do what the OS expects.
Ok so I investigated this further and I'm now even more confident that using the system APIs is the right one. Elsewhere I have figured out the case-folding algorithm for my system, but it's odd and can vary a bit depending on the OS version. Therefore asking the OS to compare strings is, I think, the best approach.
@CDirkx should I squish my commits to make my changes clearer?
I was going to comment that this behavior should be documented somewhere on the process::Command
docs, but it already is:
Note that environment variable names are case-insensitive (but case-preserving) on Windows, and case-sensitive on all other platforms.
(https://doc.rust-lang.org/std/process/struct.Command.html#method.env)
PR looks good to me, @ChrisDenton could you squash the commits?
@Mark-Simulacrum this looks ready for review (although I'm not from the Windows group). This PR addresses two problems:
- Env keys on Windows are case-insensitive, but case-preserving. The original implementation stored the
make_ascii_uppercase
d key, making the derivedOrd
implementation effectively case-insensitive (although not completely correct, see the second point), but not preserving case. This PR stores the key as given, instead adding a manual case-insensitiveOrd
implementation, thus preserving case. - The way keys are compared is changed because just uppercasing and comparing is not sufficient, Windows has complex case-folding rules, based on an older version of Unicode case folding, but with additional custom mappings. Instead of re-implementing this functionality in
std
and possibly getting it wrong, the Windows APICompareStringOrdinal
withbIgnoreCase = TRUE
is used. This does have some costs however,OsString
is aVec<u8>
and Windows API calls expect*const u16
. To avoid having to convert (which requires an allocation) every time theBTreeMap
might compare keys, both the originalOsString
and the converted UTF16-encodedVec<u16>
are stored in the key, resulting in a bit more memory use. I don't expect this to be an issue however.
So in conclusion, this PR makes sure that the code actually matches what is already in the documentation, implementing case-preserving and correct case-insensitivity:
Note that environment variable names are case-insensitive (but case-preserving) on Windows, and case-sensitive on all other platforms.
(https://doc.rust-lang.org/std/process/struct.Command.html#method.env)
r=me on the implementation. This is technically going to break any user expecting the to_uppercase behavior on Windows, right? e.g., previously env("Cargo_incremental", "1") would set CARGO_INCREMENTAL, but now won't. Seems like an unlikely situation but we'll likely want to FCP and relnotes if we land this.
I don't think there would really be such an expectation. The docs already stated that on Windows env vars are case-preserving and in general that is what Windows users would expect. I would consider the current implementation a bug, as the to_uppercase doesn't preserve case and thus does not match the documentation.
There is however indeed the small change of breaking changes; the OS will handle "CARGO_INCREMENTAL" and "Cargo_incremental" the same but applications might not.
I agree that the current implementation is a bug. That doesn't necessarily mean we can 'just' break things for users regardless, though. @rust-lang/libs, would someone care to weigh in here? I don't know if an FCP is in order, but as this is technically a breaking change it seems potentially appropriate.
I agree that the current implementation is a bug. That doesn't necessarily mean we can 'just' break things for users regardless, though. @rust-lang/libs, would someone care to weigh in here?
I'm having trouble imagining realistic scenarios where someone's code would incorrectly depend on this behavior right now. If I understand @CDirkx's last comment correctly then the risk is that if someone spawns a process and uses env
, the subprocess will receive a capitalized version of the env and may then attempt to read that env using a different case and fail. But since std::env::var
and equivalent functions in other langs are case insensitive on Windows this should succeed regardless of the case of the lookup. The only problem then is if the subprocess gets the string itself by looking up the keys in the environment then for some reason compares that to the expected string using case sensitive comparison. This seems pretty unlikely to me.
Since we already document the behavior this was supposed to have I definitely lean towards wanting to fix this despite the breakage. Either way though we should be exceedingly careful and investigate the breakage as much as we can before merging this.
I don't know if an FCP is in order, but as this is technically a breaking change it seems potentially appropriate.
I think we should FCP this, and we should probably also formalize our policy on this sort of runtime behavior breakage since this came up recently in the error handling project group. IMO we should treat this as a more serious form of breakage than ones detectable at compile time due to the silent nature of the breakage, so FCP is super warranted.
Yes I believe the only way this could lead to problems is if the user is getting the environment variable in the child process via a way that doesn't take the case-insensitivity into account (which can happen in other languages, like with getenv, or in Rust by directly comparing against values returned by std::env::vars
) AND the user either didn't take this into account or was explicitly relying on the (undocumented) uppercasing, so setting "var" while the child process expects "VAR".
How likely is that? I don't really know. I can kind of see it happening that someone sets "Path" (reasonably common spelling on Windows) and calls a C program that only checks exactly "PATH". That works with the current code, but would break with this PR. Not sure how we would test this.
How likely is that? I don't really know. I can kind of see it happening that someone sets "Path" (reasonably common spelling on Windows) and calls a C program that only checks exactly "PATH". That works with the current code, but would break with this PR. Not sure how we would test this.
The C program would have to be manually parsing the environment string instead of using GetEnvironmentVariable
(as well as doing a case sensitive comparison). This would already be broken for keys that canonically have lower case ASCII characters because these are currently impossible to set using std
. For example, windir
, SystemDrive
, ProgramFiles
, etc. But I'd agree there is potential (however unlikely) for currently working code to break after this PR. EDIT: In summary, the C program would have to have been designed to only be called by a process that ASCII uppercases environment variables. This why I think it's unlikely, albeit not impossible.
I would also reiterate that the current documentation is written as if this PR were merged. Perhaps at some point Rust did work correctly?
Yea, I'm not really sure how we'd investigate this either. If they have test cases that test that behavior a crater run might catch it, but that seems unlikely. The only other option I can think of is to lookup all crates that use .env
on a std::process::Command
and audit them to look for likely suspects, but this seems unfairly labor intensive to ask for.
To sum up, the current behaviour is flawed because:
- It does not match the OS behaviour.
- It does not match the documented behaviour.
- It's inconsistent. For ASCII it unconditionally upper cases characters a-z. For non-ASCII it preserves the case but does a bit-for-bit compare. Both behaviours are wrong but in different ways.
Realistically, for this PR to break current code this must happen:
- The environment variable is canonically ASCII upper case.
- The Rust program uses
Command
and sets the environment variable using one or more ASCII lower case characters. - A child process both manually parses the environment block and does case-sensitive key matching.
Note that the child process would already be broken for any parent process that does not ensure the relevant environment variables are ASCII upper cased. For example, if you run a program using cargo run
then all environment variables will be ASCII upper cased but if you run the program directly then the environment variables will preserve their natural case. Either way std::env::var
will work the same as it uses the OS function GetEnvironmentVariableW
.
In short, I'm not entirely sure how you'd look for likely suspects. I guess if they set a Command
var with lower case ASCII? But that's only a problem if the child process absolutely requires ASCII upper case.
Team member @yaahc has proposed to merge this. The next step is review by the rest of the tagged team members:
No concerns currently listed.
Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!
See this document for info about what commands tagged team members can give me.
Copy link
rfcbot commented 4 days ago
This is now entering its final comment period, as per the review above.
Recommend
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK