7

Update Windows Argument Parsing by ChrisDenton · Pull Request #87580 · rust-lang...

 3 years ago
source link: https://github.com/rust-lang/rust/pull/87580
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Copy link

Contributor

ChrisDenton commented on Jul 29

edited

Fixes #44650

The Windows command line is passed to applications as a single string which the application then parses to get a list of arguments. The standard rules (as used by C/C++) for parsing the command line have slightly changed over the years, most recently in 2008 which added new escaping rules.

This PR implements the new rules as described on MSDN and further detailed here. It has been tested against the behaviour of C++ by calling a C++ program that outputs its raw command line and the contents of argv. See my repo if anyone wants to reproduce my work.

For an overview of how this PR changes argument parsing behavior and why we feel it is warranted see #87580 (comment).

For some examples see: #87580 (comment)

Copy link

Collaborator

rust-highfive commented on Jul 29

r? @dtolnay

(rust-highfive has picked a reviewer for you, use r? to override)

///

/// This function was tested for equivalence to the C/C++ parsing rules using an

/// extensive test suite available at

/// <https://github.com/ChrisDenton/winarg/tree/std>.

yaahc 24 days ago

Member

would it be possible to integrate some or all of this into our testsuite?

ChrisDenton 24 days ago

Author

Contributor

Hm, more extensive testing is slow because of the overhead of starting a process but perhaps some more limited examples can be tested. However, I wasn't sure about compiling and running a C++ within the testsuite. Is that ok?

Copy link

Member

yaahc commented 24 days ago

PR looks great @ChrisDenton. I caught a couple of typos you might want to fix but other than that I'm happy to merge this when you're ready.

@bors delegate+

Copy link

Contributor

bors commented 24 days ago

v@ChrisDenton can now approve this pull request

Copy link

Member

yaahc commented 24 days ago

edited

One follow up, since this changes runtime behavior it would help to have a short overview we can add to the release notes written up on how this changes the runtime behavior and why we feel this breakage is justified.

Copy link

Contributor

Author

ChrisDenton commented 24 days ago

edited

Good point. I'm running local tests now and maybe I can figure out a summary in the meantime. I'm not very good a being concise but this is the gist of it:

Most of the parsing differences are minor and only really affect weird edge-cases. The big change is escaping " when already within a quote. This can be done by using two consecutive double quotes (i.e. "").

The main reason for updating the parsing rules is that it provides more consistent behaviour across modern (since 2008) utilities. This matters both for the user and for shells. E.g. powershell takes a string provided by the user, parses it according to powershell rules (expanding variables, etc), then produces a command line which it passes to the program. It expects the application to parse this command line in a standard way. So both powershell and the application should fully agree on how many arguments there are and what those arguments are.

However, the new double quote escaping rules are mostly for users. This is of particular concern when using the command prompt (aka cmd.exe) which passes the command line directly to the program. Or even when creating a shortcut via the gui which allows directly entering arguments. Microsoft added the new "" escaping rule to make this easier to follow, especially when nesting quotes and what with the fact that the other escape character (\) is also a path separator. This Stack Overflow answer has a few examples of where it can make code more legible. The problem with not applying these new rules is that the application may see different arguments than the user intended.

Copy link

Member

yaahc commented 24 days ago

That looks great, I've gone ahead and added a link to the overview comment in the PR description. Thank you again.

Copy link

Contributor

Author

ChrisDenton commented 24 days ago

I'd also add the small note that since Microsoft were willing to make the changes to their C/C++ argument parsing, they were presumably confident that it wouldn't be too disruptive. Of course that doesn't rule out the possibility of this change breaking something in the Rust ecosystem.

Copy link

Contributor

Author

ChrisDenton commented 24 days ago

@bors r+

Copy link

Contributor

bors commented 24 days ago

pushpin Commit fbe5691 has been approved by ChrisDenton

Copy link

Member

yaahc commented 24 days ago

I just realized I should have FCPed this since it's a change to the stable API even though it's not adding anything new. In this case it's probably okay since we already discussed this a bit and the change seems pretty subtle and straightforward but I'm gonna cc @rust-lang/libs-api just to make sure everyone has a chance to see it.

Copy link

Contributor

Author

ChrisDenton commented 24 days ago

Ah, right. And I think I should have used r= there? Sorry, I'm a bit fuzzy about Bors commands.

Copy link

Member

yaahc commented 24 days ago

edited

Ah, right. And I think I should have used r= there? Sorry, I'm a bit fuzzy about Bors commands.

Nonono, you didn't do anything wrong. I shouldn't have delegated bors before the FCP. This is all me, not you, and not a big deal either way.

Copy link

Contributor

bors commented 24 days ago

hourglass Testing commit fbe5691 with merge f20948d...

Copy link

Contributor

bors commented 24 days ago

boom Test timed out

This comment has been hidden.

Copy link

Member

BurntSushi commented 24 days ago

Are there some examples of what the breaking change here actually looks like?

One idea of a forcing function is to think about folks that maintain CLI applications. At some point, this change is going to get rolled into their release. An end user might then notice a change in how arguments are parsed and file a bug. At that point, if the maintainer isn't aware of a change like this, one might imagine that tracking down where the change stemmed from could be quite difficult. It might be helpful to characterize what the breaking changes look like, mitigations for them (if necessary) and include a mini-write-up for the release notes.

Copy link

Contributor

Author

ChrisDenton commented 24 days ago

edited

@BurntSushi Summary of the user-facing changes:

Command line old parsing new parsing app "Hello""World" 123 456 ["app", "Hello\"World 123 456"] ["app", "Hello\"World", "123", "456"] C:\"Program Files"\app.exe ["C:\\\"Program", "Files\\app.exe"] ["C:\\Program Files\\app.exe"]

The big difference is that two double quotes are taken as one literal quote instead of also ending the block. This makes it more usable for escaping quotes.

The more minor differences are with parsing the zeroth argument (aka the application name). ASCII control characters are no longer considered whitespace (aside from tab). A slightly bigger differences is that a closing quote no longer ends the application name. The first change should have no impact on users because file names can't contain control characters. The second is essentially a bug fix so that both the command prompt and the application will agree on what the application name is.

Note that:

  • The zeroth argument has different parsing rules to the other arguments. This has always been the case but they've been further tweaked here.
  • These rules don't apply to users of shells such as powershell because they do their own parsing prior to passing the command line to the application. However, it does apply to cmd.exe.

Copy link

Member

BurntSushi commented 23 days ago

@ChrisDenton Thanks! And yeah, that change looks quite sensible. Especially since it sounds like Windows itself has also done that change.

@yaahc Is it possible to still do an FCP for this? I don't see why it would be rejected, but this feels like a big enough visible change that it's probably worth going through the formal FCP process for this.

And I do think this should get put into the release notes somehow so that it gets visibility. My hope is that this will ultimately make things more sensible. For example, I know I've had a few issues filed against my CLIs in the past that basically amount to, "how do I quote things in a Windows shell," and I've been unable to give them a good answer. It sounds like a change like this would result in a bunch better user experience! We should advertise that. :-)

Copy link

Member

yaahc commented 23 days ago

@rfcbot merge

Copy link

rfcbot commented 23 days ago

edited by Amanieu

Team member @yaahc has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

It would be great to get a concise description for the release notes (just a comment on this PR is fine; linking it from the PR description would be great).

Copy link

Member

yaahc commented 23 days ago

It would be great to get a concise description for the release notes (just a comment on this PR is fine; linking it from the PR description would be great).

I asked and Chris already added an overview, which I think might be the same as what you mean, and it is linked in the PR description. Does that comment seem sufficient?

No, I don't think so; we usually want something closer to a 1-2 line description at most. This would likely go into the compatibility notes and/or library changes section, and look something like the language or library change descriptions in https://github.com/rust-lang/rust/blob/master/RELEASES.md#version-1530-2021-06-17 (e.g., adding unicode idents or into_iter).

That comment is likely useful if the release team selects this for inclusion in the blog post, though.

Something like this could work; presuming it's accurate:

Update Windows command line argument parsing to match the behavior of the standard libraries for C/C++. The rules have changed slightly over time, and this PR brings us to the latest set of rules (changed in 2008).

Copy link

Contributor

Author

ChrisDenton commented 22 days ago

Something like this could work; presuming it's accurate:

The 2008 date comes from here but I've not personally confirmed it (note it calls these tweaks "undocumented" but that changed in 2020). Whatever the date, I am confident this PR reflects the current C/C++ behavior,

Copy link

Member

BurntSushi commented 22 days ago

@rfcbot reviewed

Copy link

rfcbot commented 12 days ago

bellThis is now entering its final comment period, as per the review above. bell

Copy link

rfcbot commented 2 days ago

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.


Recommend

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK