5

When using `process::Command` on Windows, environment variable names must be cas...

 2 years ago
source link: https://github.com/rust-lang/rust/pull/85270
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Copy link

Contributor

ChrisDenton commented on May 13

When using Command to set the environment variables, the key should be compared as uppercase Unicode but when set it should preserve the original case.

Fixes #85242

Copy link

Collaborator

rust-highfive commented on May 13

r? @Mark-Simulacrum

(rust-highfive has picked a reviewer for you, use r? to override)

Copy link

Contributor

Author

ChrisDenton commented on May 13

I've moved the test to the right place. Sorry about that.

ChrisDenton

changed the title Windows enviroment variables on process::Comamnd should be case-preserving but comparing must be case-sensitive

Windows enviroment variables on process::Comamnd should be case-preserving but comparing must be case-insensitive

on May 13

This comment has been hidden.

ChrisDenton

changed the title Windows enviroment variables on process::Comamnd should be case-preserving but comparing must be case-insensitive

When using process::Command on Windows, environment variable names must be case-preserving but case-insensitive

on May 14

fn iter_uppercase<T: Iterator<Item = u16>>(iter: T) -> impl Iterator<Item = u32> {

char::decode_utf16(iter).flat_map(|cp| match cp {

Err(e) => CodeUnits::Unit(Some(e.unpaired_surrogate())),

Ok(cp) => CodeUnits::Iter(cp.to_uppercase()),

CDirkx on May 14

edited

Contributor

I checked and to_uppercase is not always correct: 'ß'.to_uppercase() == "SS", but the following fails

set_var("ß", "something");
var("SS").unwrap() // panic: Err<NotPresent>

I can't find anything about in official documentation, but from tests I did I suspect Windows uses simple case folding (for context see Unicode Technical Report #21 Case Mappings Section 1.3, simple case folding is the unicode case folding algorithm, but restricted to code points that don't expand under upper/lower casing like ß). I also checked if any normalization or locale-dependent behavior was happening, but that doesn't appear to be the case: é and (e + acute accent combining character) are different keys and nothing changed around i/I on the Turkish locale (https://en.wikipedia.org/wiki/Dotted_and_dotless_I).

Not sure what the best way to use case folding in std would be (if that is even what Windows is indeed using). There are crates like unicode-rs/rust-caseless implementing case folding, but I couldn't find any that also implements the simple variant.

Edit: Windows does not use Unicode case folding, but instead has a different uppercase implementation: see the comment below.
Edit2: It does appear to be based on reverse unicode folding, but with some unique particularities, see https://github.com/ChrisDenton/Windows-case

ChrisDenton on May 14

Author

Contributor

Hm yeah, that is a problem. I think the only way to resolve this properly is to use the OS's own function for this. So in the next commit I've replaced my implementation with a Windows API call and expanded the testing slightly.

CDirkx on May 14

Contributor

I looked through the docs some more:

File paths and environment variables should be compared using StringComparison.OrdinalIgnoreCase [1]. Comparisons made using OrdinalIgnoreCase are behaviorally the composition of two calls: calling ToUpperInvariant on both string arguments, and doing a comparison using Ordinal [2]. A comparison using Ordinal is just a comparison based on the numerical value of each utf16 code unit [3].

So the original approach was conceptually correct, but Windows apparently handles converting to uppercase differently than rust (I haven't yet found anywhere how ToUpperInvariant is actually implemented slightly_frowning_face). I agree with the approach to instead just use the Windows API.

(As an aside, with the relevant Windows API added a future possibility for std would be to expose case insensitive functionality with a std::os extension trait, for example for correctly comparing paths case insensitively.)

[1] https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings#choosing-a-stringcomparison-member-for-your-method-call
[2] https://docs.microsoft.com/en-us/dotnet/standard/base-types/best-practices-strings#stringtoupper-and-stringtolower
[3] https://docs.microsoft.com/en-us/dotnet/api/system.stringcomparison?view=net-5.0#System_StringComparison_Ordinal

m-ou-se 4 days ago

Member

Can you add this conclusion, why CompareStringOrdinal is used, in a comment in the code?

ChrisDenton 3 days ago

Author

Contributor

@m-ou-se I've added a comment (based on CDirkx's) to briefly explain how comparing environment keys works and thus why CompareStringOrdinal is used.

Copy link

Member

Mark-Simulacrum commented on May 14

@rustbot ping windows

It sounds like there's some differences between Rust's case-conversion and the one used by Windows - see this comment #85270 (comment) -- I'm wondering if some windows folks can weigh in on the right thing for us to do here for comparison.

I think it sounds like @ChrisDenton this PR is actually solving two different problems right now, right? We're currently exposing the case-converted (i.e., to_ascii_uppercase) environment variable names through std's APIs, which is loosely the decoupling intended by the first commit here. Then, the second question is whether to_ascii_uppercase is actually the right function to call (sounds like no), and what that function should be.

Copy link

Collaborator

rustbot commented on May 14

Hey Windows Group! This bug has been identified as a good "Windows candidate".
In case it's useful, here are some instructions for tackling these sorts of
bugs. Maybe take a look?
Thanks! <3

cc @arlosi @danielframpton @gdr-at-ms @kennykerr @luqmana @lzybkr @nico-abram @retep998 @rylev @sivadeilra @wesleywiser

Copy link

Contributor

Author

ChrisDenton commented on May 15

edited

To sum up where we are at now, there were two problems with the original code, before this PR:

  1. It ASCII uppercased environment variable keys. It should have left them as-is.
  2. The uppercasing was (possibly) done so that looking up environment variables by key would be case-insensitive. The problem with this is that Windows uses a variant of Unicode casing rules to do key comparisons, not just ASCII uppercasing.

My first attempt to fix this was to remove the original uppercasing and use Rust's Unicode functions to compare (and only to compare) the strings. As CDirkx has shown, this was wrong because Windows uses what it calls "ordinal" comparisons in this context. This ordinal comparison is documented as testing for "binary equality, not linguistic equality" though I'm a bit fuzzy on what that means for casing.

So now I'm using the Windows API function CompareStringOrdinal to do the comparison, which appears to do what the OS expects.

Copy link

Contributor

Author

ChrisDenton commented on May 16

Ok so I investigated this further and I'm now even more confident that using the system APIs is the right one. Elsewhere I have figured out the case-folding algorithm for my system, but it's odd and can vary a bit depending on the OS version. Therefore asking the OS to compare strings is, I think, the best approach.

@CDirkx should I squish my commits to make my changes clearer?

Copy link

Contributor

CDirkx commented on May 19

I was going to comment that this behavior should be documented somewhere on the process::Command docs, but it already is:

Note that environment variable names are case-insensitive (but case-preserving) on Windows, and case-sensitive on all other platforms.

(https://doc.rust-lang.org/std/process/struct.Command.html#method.env)

Copy link

Contributor

CDirkx commented on May 19

PR looks good to me, @ChrisDenton could you squash the commits?

Copy link

Contributor

CDirkx commented on May 19

edited

@Mark-Simulacrum this looks ready for review (although I'm not from the Windows group). This PR addresses two problems:

  • Env keys on Windows are case-insensitive, but case-preserving. The original implementation stored the make_ascii_uppercased key, making the derived Ord implementation effectively case-insensitive (although not completely correct, see the second point), but not preserving case. This PR stores the key as given, instead adding a manual case-insensitive Ord implementation, thus preserving case.
  • The way keys are compared is changed because just uppercasing and comparing is not sufficient, Windows has complex case-folding rules, based on an older version of Unicode case folding, but with additional custom mappings. Instead of re-implementing this functionality in std and possibly getting it wrong, the Windows API CompareStringOrdinal with bIgnoreCase = TRUE is used. This does have some costs however, OsString is a Vec<u8> and Windows API calls expect *const u16. To avoid having to convert (which requires an allocation) every time the BTreeMap might compare keys, both the original OsString and the converted UTF16-encoded Vec<u16> are stored in the key, resulting in a bit more memory use. I don't expect this to be an issue however.

So in conclusion, this PR makes sure that the code actually matches what is already in the documentation, implementing case-preserving and correct case-insensitivity:

Note that environment variable names are case-insensitive (but case-preserving) on Windows, and case-sensitive on all other platforms.

(https://doc.rust-lang.org/std/process/struct.Command.html#method.env)

r=me on the implementation. This is technically going to break any user expecting the to_uppercase behavior on Windows, right? e.g., previously env("Cargo_incremental", "1") would set CARGO_INCREMENTAL, but now won't. Seems like an unlikely situation but we'll likely want to FCP and relnotes if we land this.

Copy link

Contributor

CDirkx commented 25 days ago

edited

I don't think there would really be such an expectation. The docs already stated that on Windows env vars are case-preserving and in general that is what Windows users would expect. I would consider the current implementation a bug, as the to_uppercase doesn't preserve case and thus does not match the documentation.

There is however indeed the small change of breaking changes; the OS will handle "CARGO_INCREMENTAL" and "Cargo_incremental" the same but applications might not.

I agree that the current implementation is a bug. That doesn't necessarily mean we can 'just' break things for users regardless, though. @rust-lang/libs, would someone care to weigh in here? I don't know if an FCP is in order, but as this is technically a breaking change it seems potentially appropriate.

Copy link

Member

yaahc commented 16 days ago

edited

I agree that the current implementation is a bug. That doesn't necessarily mean we can 'just' break things for users regardless, though. @rust-lang/libs, would someone care to weigh in here?

I'm having trouble imagining realistic scenarios where someone's code would incorrectly depend on this behavior right now. If I understand @CDirkx's last comment correctly then the risk is that if someone spawns a process and uses env, the subprocess will receive a capitalized version of the env and may then attempt to read that env using a different case and fail. But since std::env::var and equivalent functions in other langs are case insensitive on Windows this should succeed regardless of the case of the lookup. The only problem then is if the subprocess gets the string itself by looking up the keys in the environment then for some reason compares that to the expected string using case sensitive comparison. This seems pretty unlikely to me.

Since we already document the behavior this was supposed to have I definitely lean towards wanting to fix this despite the breakage. Either way though we should be exceedingly careful and investigate the breakage as much as we can before merging this.

I don't know if an FCP is in order, but as this is technically a breaking change it seems potentially appropriate.

I think we should FCP this, and we should probably also formalize our policy on this sort of runtime behavior breakage since this came up recently in the error handling project group. IMO we should treat this as a more serious form of breakage than ones detectable at compile time due to the silent nature of the breakage, so FCP is super warranted.

Copy link

Contributor

CDirkx commented 16 days ago

Yes I believe the only way this could lead to problems is if the user is getting the environment variable in the child process via a way that doesn't take the case-insensitivity into account (which can happen in other languages, like with getenv, or in Rust by directly comparing against values returned by std::env::vars) AND the user either didn't take this into account or was explicitly relying on the (undocumented) uppercasing, so setting "var" while the child process expects "VAR".

How likely is that? I don't really know. I can kind of see it happening that someone sets "Path" (reasonably common spelling on Windows) and calls a C program that only checks exactly "PATH". That works with the current code, but would break with this PR. Not sure how we would test this.

Copy link

Contributor

Author

ChrisDenton commented 16 days ago

edited

How likely is that? I don't really know. I can kind of see it happening that someone sets "Path" (reasonably common spelling on Windows) and calls a C program that only checks exactly "PATH". That works with the current code, but would break with this PR. Not sure how we would test this.

The C program would have to be manually parsing the environment string instead of using GetEnvironmentVariable (as well as doing a case sensitive comparison). This would already be broken for keys that canonically have lower case ASCII characters because these are currently impossible to set using std. For example, windir, SystemDrive, ProgramFiles, etc. But I'd agree there is potential (however unlikely) for currently working code to break after this PR. EDIT: In summary, the C program would have to have been designed to only be called by a process that ASCII uppercases environment variables. This why I think it's unlikely, albeit not impossible.

I would also reiterate that the current documentation is written as if this PR were merged. Perhaps at some point Rust did work correctly?

Copy link

Member

yaahc commented 16 days ago

Yea, I'm not really sure how we'd investigate this either. If they have test cases that test that behavior a crater run might catch it, but that seems unlikely. The only other option I can think of is to lookup all crates that use .env on a std::process::Command and audit them to look for likely suspects, but this seems unfairly labor intensive to ask for.

Copy link

Contributor

Author

ChrisDenton commented 16 days ago

To sum up, the current behaviour is flawed because:

  1. It does not match the OS behaviour.
  2. It does not match the documented behaviour.
  3. It's inconsistent. For ASCII it unconditionally upper cases characters a-z. For non-ASCII it preserves the case but does a bit-for-bit compare. Both behaviours are wrong but in different ways.

Realistically, for this PR to break current code this must happen:

  1. The environment variable is canonically ASCII upper case.
  2. The Rust program uses Command and sets the environment variable using one or more ASCII lower case characters.
  3. A child process both manually parses the environment block and does case-sensitive key matching.

Note that the child process would already be broken for any parent process that does not ensure the relevant environment variables are ASCII upper cased. For example, if you run a program using cargo run then all environment variables will be ASCII upper cased but if you run the program directly then the environment variables will preserve their natural case. Either way std::env::var will work the same as it uses the OS function GetEnvironmentVariableW.

In short, I'm not entirely sure how you'd look for likely suspects. I guess if they set a Command var with lower case ASCII? But that's only a problem if the child process absolutely requires ASCII upper case.

Copy link

Member

yaahc commented 16 days ago

edited

To sum up, the current behaviour is flawed because:

Thank you for this summary. In this context I'm much less worried about the potential breakage here.

Based on the second half of my comment here I'm gonna go ahead and start an FCP for this.

@rfcbot merge

Copy link

rfcbot commented 16 days ago

edited by m-ou-se

Team member @yaahc has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

Copy link

rfcbot commented 4 days ago

bellThis is now entering its final comment period, as per the review above. bell


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK