2

Github rustdoc: Add unstable option to only emit shared/crate-specific files by...

 3 years ago
source link: https://github.com/rust-lang/rust/pull/83478
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Copy link

Member

jyn514 commented 16 days ago

edited

The intended use case is for docs.rs, which can now copy exactly the
files it cares about, rather than having to guess based on whether they
have a resource suffix or not. In particular, some files have a resource
suffix but cannot be shared between crates: rust-lang/docs.rs#1312 (comment)

The end goal is to fix rust-lang/docs.rs#1327 by reverting rust-lang/docs.rs#1324.

This obsoletes --print=unversioned-files, which I plan to remove as
soon as docs.rs stops using it.

I recommend reviewing this one commit at a time.

r? @GuillaumeGomez cc @Nemo157 @pietroalbini

This comment has been hidden.

I admit I'm a bit worried that some of the files labeled as UNVERSIONED here are such -- what do we lose by not versioning them with the toolchain? It seems like we're shipping them alongside Rust, so they're pretty clearly toolchain-specific and can change over time. If docs.rs or similar services want to deduplicate these files for storage purposes, and we want to make that easier, then emitting the files with a hash in their name (i.e., one dependent on the file contents) seems like the right path: then the filename uniquely identifies the file, and as such that file is unversioned in the sense that no matter what toolchain emits it will have the same contents.

Maybe I'm missing some history here or context, though; I'm a bit worried that we're adding (I guess unstable, but still) surface area to support use cases without documenting the design space a bit more -- maybe that's happened somewhere though? A link would be great!


With regards to the testing strategy for this, you asked on Zulip if you can check that a test generates a given file, but that doesn't fit the expectation I'd have? In particular, I'm thinking that what is desired is to make sure the output from --print precisely describes all files emitted, not that some particular ones are included; that would prevent the failure mode of forgetting a file in print, but also the failure mode of emitting a filename that was not actually written to disk. I'd probably use a run-make test for this, still, but it feels like the right thing to me.

Copy link

Member

Author

jyn514 commented 16 days ago

I admit I'm a bit worried that some of the files labeled as UNVERSIONED here are such -- what do we lose by not versioning them with the toolchain? It seems like we're shipping them alongside Rust, so they're pretty clearly toolchain-specific and can change over time.

I would rather not deal with this here - it's not causing bugs for docs.rs currently, and it's a very long-standing bug. This PR doesn't change those filenames at all, it just lets docs.rs list them separately from crate-specific files.

If docs.rs or similar services want to deduplicate these files for storage purposes, and we want to make that easier, then emitting the files with a hash in their name (i.e., one dependent on the file contents) seems like the right path: then the filename uniquely identifies the file, and as such that file is unversioned in the sense that no matter what toolchain emits it will have the same contents.

Honestly I would be ok with this, but it doesn't make docs.rs' life any better, it still has to deal with the complexity from previous builds. We discussed this briefly on discord but I don't think we came to any conclusions.

Maybe I'm missing some history here or context, though; I'm a bit worried that we're adding (I guess unstable, but still) surface area to support use cases without documenting the design space a bit more -- maybe that's happened somewhere though? A link would be great!

It was mostly discussed on Discord, I'll copy/paste the summary here.

I do want to say that I consider this "super unstable" and would probably close bugs about it if they didn't come directly from the docs.rs team. It's not intended to ever be stabilized.


  1. rustdoc added a new crates.js file a few weeks ago, breaking search for all newly built crates rust-lang/docs.rs#1300
  2. dtolnay added crates.js and also some other essential files that were missing rust-lang/docs.rs#1301
  3. it turns out the other essential files had a typo, breaking all builds rust-lang/docs.rs#1305. the typo was fixed by rust-lang/docs.rs#1308
  4. rustdoc wanted to start adding more essential files (#82732, #80705) and I got fed up of dealing with it so I opened rust-lang/docs.rs#1312
  5. #1312 had a bug (rust-lang/docs.rs#1312 (comment)) if we copy more files than necessary. I opened rust-lang/docs.rs#1324 giving precedence to local shared files over global ones which avoids the bug.
  6. #1324 broke docs that weren't using --extern-html-root-url (rust-lang/docs.rs#1327 (comment)).

Some possible solutions are to

  • Have docs.rs hard-code files we know can't be shared, like the search-index. That's an alternative approach to 5 which reverts #1324 and instead only copies the files that are necessary.
  • Have docs.rs serve shared files from subdirectories if the crate was built with one of the broken nightlies. That builds on #1324 and only special cases the affected nightlies.
  • Rebuild all affected crates (over 13000 releases, probably not practical).
  • Instead of using --print=unversioned-files, add a new --print=static-root-files that prints all static root filenames - basically this PR, but using --print instead of --emit and still unconditionally generating all files.
  • Add --emit=static-root-files (this PR).

In particular, I'm thinking that what is desired is to make sure the output from --print precisely describes all files emitted, not that some particular ones are included; that would prevent the failure mode of forgetting a file in print, but also the failure mode of emitting a filename that was not actually written to disk. I'd probably use a run-make test for this, still, but it feels like the right thing to me.

This PR doesn't change --print at all. The idea is that docs.rs can pass --emit=unversioned-shared-resources,toolchain-shared-resources and then copy the whole doc directory unconditionally.

Copy link

Member

Author

jyn514 commented 16 days ago

edited

A summary of the summary: basically the issue is that we can't copy fewer shared files than rustdoc generates, and due to 2 months of broken builds about 3 years ago, we also can't copy more. So we have to know exactly which files should be shared and which shouldn't.

Copy link

Member

GuillaumeGomez left a comment

Code looks good to me. So once the debate with @Mark-Simulacrum is done, you have my approval. ;)

Ok, I have more questions, but ultimately don't feel the need to try to insert myself into all that :)

Is src/test/run-make/emit-shared-files/unversioned-files.txt used somewhere? A simple ctrl+f for the filename didn't seem to turn anything up, so I'm wondering if it's just a stray file or I missed something while looking at the test.

It also feels like the test is sort of likely to miss things in the sense that it's not exhaustively checking the files that were emitted; it appears to only verify that specific files either exist or don't. Is that intentional?

Copy link

Member

Author

jyn514 commented 16 days ago

Is src/test/run-make/emit-shared-files/unversioned-files.txt used somewhere? A simple ctrl+f for the filename didn't seem to turn anything up, so I'm wondering if it's just a stray file or I missed something while looking at the test.

Oops, thanks - I copied this from src/test/run-make-fulldeps/print-unversioned-files and forgot to delete the files I didn't need. Should be fixed now.

It also feels like the test is sort of likely to miss things in the sense that it's not exhaustively checking the files that were emitted; it appears to only verify that specific files either exist or don't. Is that intentional?

Yes, for two reasons:

  1. I don't think it's useful to exhaustively test every file, docs.rs shouldn't be depending on the exact files anyway.
  2. This isn't meant to test those specific files, just that those categories of files are tested. In other words, if rustdoc changes crates.js in the future to only depend on the toolchain, not the crate, that would be a reasonable change and there's no need for the test to block it. Would it be helpful to add some comments with the category the file comes from?

Copy link

Member

Author

jyn514 commented 15 days ago

@Mark-Simulacrum did you want to take another look at this before it merges? You're not assigned to the PR, I just wanted to make sure it's not lost in notifications.

I would, let me assign myself.

I am OK with this proceeding; r=GuillaumeGomez, as you would like, I suppose.

Ultimately I think the tests / implementation here doesn't match what I would expect (or I don't understand it), but I'm also not really a rustdoc maintainer, and so I won't block this further I suppose.

This isn't meant to test those specific files, just that those categories of files are tested. In other words, if rustdoc changes crates.js in the future to only depend on the toolchain, not the crate, that would be a reasonable change and there's no need for the test to block it. Would it be helpful to add some comments with the category the file comes from?

I think my confusion here is that I think the added test isn't checking the "real problem" which is when a file is in the wrong category. For example, the comment in the code about docs.rs not touching default theme settings (I think) is not reflected at all in the test (e.g., as a currently failing test with a fixme comment); that worries me, as it suggests we would easily pick up more such bugs (or they may exist today). It seems like ideally there would be some way to pretty statically detect which category a file is in, e.g., if there's anything other than a cp of it, it's likely crate-specific. If it's cp'd from a include_str! or similar with no modification possible, it's toolchain-specific.

Unversioned files make no sense to me so I can't really comment there. IMO, it's a bug to have them and I'm surprised that this PR can't fix that, as it suggests this framework is insufficiently general - I would expect that with this PR, rustdoc / docs.rs would no longer have to coordinate at all on whether files are versioned or not. Is that not correct?

Copy link

Member

Author

jyn514 commented 10 days ago

For example, the comment in the code about docs.rs not touching default theme settings (I think) is not reflected at all in the test (e.g., as a currently failing test with a fixme comment); that worries me, as it suggests we would easily pick up more such bugs (or they may exist today).

The problem is that custom themes don't fit into the toolchain/crate framework: they depend on the invocation. I don't know what test you'd want: the problem is that if you run

rustdoc --theme x.css lib.rs
mv y.css x.css
rustdoc --theme x.css lib.rs

then the original x.css gets overriden. But docs.rs doesn't do that (it doesn't use --theme at all). And I don't know if that's a bug as such, it only turns into a bug if you expected the two files to be the same. I would rather not make this feature more complicated than necessary since it's only needed for docs.rs.

It seems like ideally there would be some way to pretty statically detect which category a file is in, e.g., if there's anything other than a cp of it, it's likely crate-specific. If it's cp'd from a include_str! or similar with no modification possible, it's toolchain-specific.

I'm not sure what this would look like - do you want me to add a &'static bound to write_minify and write_toolchain, and a generate_content: fn() -> OsString argument to write_crate? That would work for everything except the custom theme, which I don't mind special-casing.

I would expect that with this PR, rustdoc / docs.rs would no longer have to coordinate at all on whether files are versioned or not. Is that not correct?

Oh hmm, this is a good point. I can't remove unversioned files until docs.rs stops using --print=unversioned-files, but I can remove them at the same time I remove --print :)

I don't know whether the 'static bound makes sense, but something like that feels closer to the 'truth' in the sense that the goal I'd personally have is that you can't get it wrong.

The theme CLI arg being 'special' feels a bit off -- surely there are other similar arguments? e.g., the URL root (forget name), which affects other stuff too.

I think my expectation is that the crate-versioned files are actually (crate, toolchain, invocation)-versioned, right? That seems entirely reasonable to me, but isn't really reflected in docs today; with that change args fall into it quite reasonably I think, including theme.

Copy link

Member

Author

jyn514 commented 10 days ago

The theme CLI arg being 'special' feels a bit off -- surely there are other similar arguments? e.g., the URL root (forget name), which affects other stuff too.

Well right but the point I'm trying to make is that it doesn't matter for docs.rs. Docs.rs doesn't need the files to be stable across invocations, it just needs to know when the cache should be invalidated.

I guess it is a good point that someone could add --theme x in their RUSTDOCFLAGS in package.metadata.docs.rs, which makes this "crate-specific" in the sense that docs.rs could pass rustdoc different arguments depending on the crate. But it's not something rustdoc knows about.

I'm trying to understand your comment and I think I'm missing something.

If docs.rs only cared about cache invalidation, then you'd not need 3 categories, you could use the hash of the file to know if it needs invalidation; we are intending to use the categories (AIUI) to avoid such a scheme by assuming stability for some class of inputs (e.g., toolchain files are assumed stable across all crates). If that condition is violated, I assume that docs.rs will see bugs (e.g., broken rendering due to mismatched files).

That to me implies that it is in fact critical that we get these categories right, including in the presence of RUSTDOCFLAGS etc, as set by the user via metadata.

So theme should be a crate-specific file, because it depends on the invocation, right? Same as the docs for that crate are, realistically.

As I see it, the categories are defined as follows:

  • unversioned: these files we "promise" never change, even across toolchain upgrades (will be removed soon)
  • toolchain: these files are independent of invocation (including the crate)
  • crate-specific: these files are dependent on invocation, including crate, and obviously the toolchain

If this is inaccurate or does not match your expectations, it seems good to address that. These categories also imply stability for unchanged right hand sides as dictated per category; if that is not met, the file is in the wrong category IMO.

toolchain-only:

$(RUSTDOC) -Z unstable-options --emit=toolchain-shared-resources --output $(TOOLCHAIN_ONLY) --resource-suffix=-xxx x.rs

[ -e $(TOOLCHAIN_ONLY)/storage-xxx.js ]

jyn514 10 days ago

Author

Member

Docs.rs always uses a resource suffix that only depends on the toolchain, not anything else. Using --resource-suffix with rustflags in package.metadata.docs.rs will massively break things anyway, it's not supported.

let mut themes: Vec<&String> = themes.iter().collect();

themes.sort();

write_minify(

&cx.shared.fs,

cx.path("main.js"),

"main.js",

&static_files::MAIN_JS.replace(

"/* INSERT THEMES HERE */",

Mark-Simulacrum 10 days ago

Member

This also implies this is not toolchain, but rather invocation, specific.

jyn514 10 days ago

Author

Member

Ugh, I guess so. Do you mind if I leave it for later? Docs.rs was caching it before anyway: https://github.com/rust-lang/docs.rs/pull/1312/files#diff-b06b44e6583254ce69af8cedabc04a73a97599d593676e1bbad7ee442240ccf1L35 and it should only matter if someone passes a custom --theme, it seems a shame to copy it every time when it so rarely matters.

This comment has been minimized.

This comment has been hidden.

let mut themes: Vec<&String> = themes.iter().collect();

themes.sort();

write_minify(

&cx.shared.fs,

cx.path("main.js"),

"main.js",

&static_files::MAIN_JS.replace(

"/* INSERT THEMES HERE */",

jyn514 10 days ago

Author

Member

Ugh, I guess so. Do you mind if I leave it for later? Docs.rs was caching it before anyway: https://github.com/rust-lang/docs.rs/pull/1312/files#diff-b06b44e6583254ce69af8cedabc04a73a97599d593676e1bbad7ee442240ccf1L35 and it should only matter if someone passes a custom --theme, it seems a shame to copy it every time when it so rarely matters.

I don't mind leaving things for later on either of the problems, but I do think it'd be worth filing a tracking issue for this feature, and documenting the problems with it there as well as the reason we're adding it.

Copy link

Member

Author

jyn514 commented 10 days ago

I do think it'd be worth filing a tracking issue for this feature, and documenting the problems with it there as well as the reason we're adding it.

A tracking issue seems like a bad fit for something we never plan to stabilize. What do you think about putting it in the unstable book instead?

I think the important bit is that the known limitations are documented in a way that is readily available; imo today tracking issues solve that better. I think it's reasonable to track unstable features even if in there current incarnation they won't get stabilized, particularly when they are sort of intended for third party use (in this case rust project internal, i.e. docs.rs).

Copy link

Member

Author

jyn514 commented 8 days ago

I'm still a little unsure what the tracking issue is for, but the docs have been broken for about 10 days now and I would really like to land this, so I opened #83784.

@rustbot label: -S-waiting-on-author +S-waiting-on-review

Copy link

Contributor

bors commented 8 days ago

pushpin Commit 413938d has been approved by Mark-Simulacrum

bors

merged commit 31f5320 into rust-lang:master 8 days ago

10 checks passed

rustbot

added this to the 1.53.0 milestone

8 days ago

jyn514

deleted the

jyn514:fine-grained-files

branch

7 days ago


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK