8

doc comment revisions: headings, lists, and links · Discussion #48305 · golang/g...

 2 years ago
source link: https://github.com/golang/go/discussions/48305
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
edited

rsc 3 days ago

Maintainer

I am looking into the possibility of revising Go's doc comment syntax, specifically adjusting headings and adding lists and links. This discussion is meant to gather feedback before writing an official proposal.

The current Go doc comment format has served us well since their introduction in 2009. There has only been one significant change, which was the addition of headings in 2011. But there are also a few long-open issues and proposals about doc comments, including:

  • #7349 points out that the headings rule does not work well with non-Roman scripts.
  • #31739 points out that lines ending with double quotes cannot be headings.
  • #34377 points out that lines ending with parens cannot be headings.
  • #7873 asks for list support.
  • #45533 proposes linking of symbols, written as [io.EOF], to make it easier to write good top-level doc comments and cross-reference with other packages.

It makes sense, as we approach a decade of experience, to take what we've learned and make one coherent revision, setting the syntax for the next 10 or so years.

Goals and non-goals

The primary design criteria for Go doc comments was to make them readable as ordinary comments when viewing the source code directly, in contrast to systems like C#'s Xmldoc, Java's Javadoc, and Perl's Perlpod. The goal was to prioritize readability, avoiding syntactic ceremony and complexity. This remains as a primary goal.

Another concern, new since 2009, is backwards compatibility. Whatever changes we make, existing doc comments must generally continue to render well. Less important, but still something to keep in mind, is forward compatibility: keeping new doc comments rendering well in older Go versions, for a smoother transition.

Another goal for the revamp is that it include writing a separate, standalone web page explaining how to write Go doc comments. Today that information is squirreled away in the doc.ToHTML comment and is not easily found or widely known.

Within those constraints, the focus I have set for this revamp is to address the issues listed above. Specifically:

  1. Make the header syntax more predictable. The headings rule is clearly difficult to remember and has too many false negatives. But further adjustments of the current rule run the risk of false positives.

  2. Add support for lists. There are many times in documentation when a bullet or numbered list is called for. Those appear in many doc comments today, as indented <pre> blocks.

  3. Add support for links to URLs. Today the only way to link to something is by writing the URL directly, but those can sometimes be quite unreadable and interrupt the text.

  4. Add support for links to Go API documentation, in the current package and in other packages. This would have multiple benefits, but one is the ability in large packages to write top-level doc comments that give a good overview and link directly to the functions and types being described.

I believe it also makes sense to add another goal:

  1. Add formatting of doc comments to gofmt, to promote consistent appearance and create more room for future changes.

It is not a goal to support every possible kind of documentation or markup. For example:

  • Plain text has served us very well so far, and while some might prefer that comments allow font changes, the syntactic ceremony and complexity involved seems not worth the benefit, no matter how it is done.

  • People have asked for support for embedding images in documentation (see #39513), but that adds significant complexity as well: image size hints, different resolutions, image sets, images suitable for both light and dark mode presentation, and so on. It is also difficult (but not impossible) to render them on the command line. Although images clearly have important uses, all this complexity is in direct conflict with the primary goal. For these reasons, images are out of scope. I also note that C#'s Xmldoc and Perl's Perlpod seem not to have image support, although Java's Javadoc does.

Markdown is not the answer, but we can borrow good ideas

An obvious suggestion is to switch to Markdown; this is especially obvious given the discussion being hosted on GitHub where all comments are written in Markdown. I am fairly convinced Markdown is not the answer, for a few reasons.

First, there is no single definition of Markdown, as explained on the CommonMark site. CommonMark is roughly what is used on GitHub, Reddit, and Stack Overflow (although even among those there can be significant variation). Even so, let's define Markdown as CommonMark and continue.

Second, Markdown is not backwards compatible with existing doc comments. Go doc comments require only a single space of indentation to start a <pre> block, while Markdown requires more. Also, it is common for Go doc comments to use Go expressions like `raw strings` or formulas like a*x^2+b*x+c. Markdown would instead interpret those as syntactic markup and render as “raw strings or formulas like ax^2+bx+c”. Existing comments would need to be revised to make them Markdown-safe.

Third, many features in Markdown are not terribly readable. The basics of Markdown can be simple and punctuation-free, but once you get into more advanced uses, there is a surfeit of notation which directly works against the goal of being able to read (and write) program comments in source files without special tooling. Markdown doc comments would end up full of backquotes and underscores and stars, along with backslashes to escape punctuation that would otherwise be interpreted specially. (Here is my favorite recent example of a particularly subtle issue.)

Fourth, Markdown is surprisingly complex. Markdown, befitting its Perl roots, provides more than one way to do just about anything: _i_, *i*, and <em>i</em>; Setext and ATX headings; indented code blocks and fenced code blocks; three different ways to write a link; and so on. There are subtle rules about exactly how many spaces of indentation are required or allowed in different circumstances. All of this harms not just readability but also comprehensibility, learnability, and consistency. The ability to embed arbitrary HTML adds even more complexity. Developers should be spending their time on the code, not on arcane details of documentation formatting.

Of course, Markdown is widely used and therefore familiar to many users. Even though it would be a serious mistake to adopt Markdown in its entirety, it does make sense to look to Markdown for conventions that users would already be familiar with, that we can tailor to Go's needs. If you are a fan of Markdown, you can view this revision as making Go adopt a (very limited) subset of Markdown. If not, you can view it as Go adopting a couple extra conventions that can be defined separately from any Markdown implementation or spec.

Headings

The current rule is:

A span that consists of a single line, is followed by another paragraph span, begins with a capital letter, and contains no punctuation other than parentheses and commas is formatted as a heading.

I can never remember the details of this exact rule, despite having chosen it. Every time I write a heading, I worry about whether it's going to be recognized as such. Others clearly have the same problem (#7349, #31739, #34377). The rule avoided the need for visible syntax, but in retrospect visible syntax would have been simpler. As Markdown shows us, that syntax can be very lightweight: a single “#” would suffice. Therefore I suggest the following:

New Rule: If a span of non-blank lines is a single line beginning with # followed by a space or tab and then additional text, then that line is a heading.

# This is a heading

Here are some examples of variations that do not satisfy the rule and are therefore not headings:

#This is not a heading, because there is no space.

# This is not a heading,
# because it is multiple lines.

The next span is not a heading, because there is no additional text:

#

In the middle of a span of non-blank lines,
# this is not a heading either.

Transition: The old heading rule will remain valid, which is acceptable since it mainly has false negatives, not false positives.
Gofmt will rewrite old-style headings into new-style headings, so that the fact of being a heading is made clearer to readers.

Lists

There is no support for lists today. As noted before, documentation needing lists uses indented <pre> blocks instead.

For example, here are the docs for cookiejar.PublicSuffixList:

// PublicSuffixList provides the public suffix of a domain. For example:
//      - the public suffix of "example.com" is "com",
//      - the public suffix of "foo1.foo2.foo3.co.uk" is "co.uk", and
//      - the public suffix of "bar.pvt.k12.ma.us" is "pvt.k12.ma.us".
//
// Implementations of PublicSuffixList must be safe for concurrent use by
// multiple goroutines.

And here are the docs for url.URL.String:

// In the second form, the following rules apply:
//      - if u.Scheme is empty, scheme: is omitted.
//      - if u.User is nil, userinfo@ is omitted.
//      - if u.Host is empty, host/ is omitted.
//      - if u.Scheme and u.Host are empty and u.User is nil,
//         the entire scheme://userinfo@host/ is omitted.
//      - if u.Host is non-empty and u.Path begins with a /,
//         the form host/path does not add its own /.
//      - if u.RawQuery is empty, ?query is omitted.
//      - if u.Fragment is empty, #fragment is omitted.

Ideally, we'd like to adopt a rule that makes these into bullet lists without any edits at all. (Markdown's space-counting rules would make these <pre> blocks, not lists.)

Today, a span of lines all indented by one or more spaces or tabs is always a <pre> block.
I suggest the following:

New Rule: In a span of lines all blank or indented by one or more spaces or tabs (which would otherwise be a <pre> block),
if the first indented line begins with a bullet list marker or a numbered list marker,
then that span of indented lines is a bullet list or numbered list.
A bullet list marker is a dash, star, or plus followed by a space or tab and then text.
In a bullet list, each line beginning with a bullet list marker starts a new list item.
A numbered list marker is a decimal number followed by a period or right parenthesis, then a space or tab, and then text.
In a numbered list, each line beginning with a number list marker starts a new list item.
Item numbers are left as is, never renumbered (unlike Markdown).

Using this rule, the two doc comments above are both recognized and formatted as bullet lists, not as <pre> blocks.

Note that the rule means that a list item followed by a blank line followed by additional indented text continues the list item (regardless of comparative indentation level):

// Text.
//
//  - A bullet.
//
//     Another paragraph of that first bullet.
//
// - A second bullet.

Note also that there are no code blocks inside list items—any indented paragraph following a list item continues the list item, and the list ends at the next unindented line—nor are there nested lists. This avoids all of the space-counting subtlety of Markdown.

To re-emphasize, a critical property of this definition of lists is that it makes existing doc comments written with pseudo-lists turn into doc comments with real lists.

Transition: Gofmt will rewrite recognized bullet and numbered lists to use a standard format. For example, the two doc comments above would reformat to:

// PublicSuffixList provides the public suffix of a domain. For example:
//
//   - the public suffix of "example.com" is "com",
//   - the public suffix of "foo1.foo2.foo3.co.uk" is "co.uk", and
//   - the public suffix of "bar.pvt.k12.ma.us" is "pvt.k12.ma.us".
//
// Implementations of PublicSuffixList must be safe for concurrent use by
// multiple goroutines.
// In the second form, the following rules apply:
//
//   - if u.Scheme is empty, scheme: is omitted.
//   - if u.User is nil, userinfo@ is omitted.
//   - if u.Host is empty, host/ is omitted.
//   - if u.Scheme and u.Host are empty and u.User is nil,
//     the entire scheme://userinfo@host/ is omitted.
//   - if u.Host is non-empty and u.Path begins with a /,
//     the form host/path does not add its own /.
//   - if u.RawQuery is empty, ?query is omitted.
//   - if u.Fragment is empty, #fragment is omitted.

The specific formatting rules are discussed in the Formatting section below.

Markdown recognizes three different bullets: -, *, and +. In the main Go repo, the dash is dominant: in comments of the form `//[ \t]+[-+*] ` (grepping, so some of these may not be in doc comments), 84% use -, 14% use *, and 2% use +. In a now slightly dated corpus of external Go code, the star is dominant: 37.6% -, 61.8% *, 0.7% +.

Markdown also recognizes two different numeric list item suffixes: “1.” and “1)”. In the main Go repo, 66% of comments use “1.” (versus 34% for “1)”). In the external corpus, “1.” is again the dominant choice, 81% to 19%.

We have two conflicting goals: handle existing comments well, and avoid needless variation. To satisfy both, all three bullets and both forms of numbers will be recognized, but gofmt (see below) will rewrite them to a single canonical form: dash for bullets, and “N.” for numbers. (Why dashes and not asterisks? Proper typesetting of bullet lists sometimes does use dashes, but never uses asterisks, so using dashes keeps the comments looking as typographically clean as possible.)

Links to URLs

Documentation is more useful with clear links to other web pages. For example, the encoding/json package doc today says:

// Package json implements encoding and decoding of JSON as defined in
// RFC 7159. The mapping between JSON and Go values is described
// in the documentation for the Marshal and Unmarshal functions.
//
// See "JSON and Go" for an introduction to this package:
// https://golang.org/doc/articles/json_and_go.html

There is no link to the actual RFC 7159, leaving the reader to Google it. And the link to the “JSON and Go” article must be copied and pasted. Loosely following the Markdown shortcut reference link format, I suggest the following:

New Rule: A span of unindented non-blank lines defines link targets when each line is of the form “[Text]: URL”. In other text, “[Text]” represents a link to URL using the given text—in HTML, <a href="URL">Text</a>.

For example:

// Package json implements encoding and decoding of JSON as defined in
// [RFC 7159]. The mapping between JSON and Go values is described
// in the documentation for the Marshal and Unmarshal functions.
//
// For an introduction to this package, see the article
// “[JSON and Go].”
//
// [RFC 7159]: https://tools.ietf.org/html/rfc7159
// [JSON and Go]: https://golang.org/doc/articles/json_and_go.html

Note that the link definitions can only be given in their own “paragraph” (span of non-blank unindented lines), which can contain more than one such definition, one per line. If there is no corresponding URL declaration, then (except for doc links, described in the next section) the text is not a hyperlink, and the square brackets are preserved.

This format only minimally interrupts the flow of the actual text, since the URLs are moved to a separate section. As already noted, it also roughly matches the Markdown shortcut reference link format, without the optional title text.

Transition: Gofmt will move link definitions to the end of the overall doc comment. Go vet will flag unused link targets. Older versions of Go will show the text verbatim, which is fairly readable.

Links to Go API documentation

Documentation is also more useful with clear links to other documentation, whether it's one function linking to another, preferred version or a top-level doc comment summarizing the overall API of the package, with links to the key types and functions. Today there is no way to do this. Names can be mentioned, of course, but users must find the docs on their own.

Following discussion on #45533, I suggest to treat doc links like the links in the previous section, without target definitions. Specifically:

New Rule: Doc links are links of the form “[Name1]” or “[Name1.Name2]” to refer to exported identifiers in the current package, or “[pkg]”, “[pkg.Name1]”, or “[pkg.Name1.Name2]” to refer to identifiers in other packages.
In the second form, “pkg” can be either a full import path or the assumed package name of an existing import.
The assumed package name is either the identifier in a renamed import or else the name assumed by goimports. (Goimports inserts renamings when that assumption is not correct, so this rule should work for essentially all Go code.)
A “pkg” is only assumed to be a full import path if it starts with a domain name (a path element with a dot) or is one of the packages from the standard library (“[os]”, “[encoding/json]”, and so on).
To avoid problems with maps, generics, and array types, doc links must be both preceded and followed by punctuation, spaces, tabs, or the start or end of a line.

For example, if the current package imports encoding/json, then “[json.Decoder]” can be written in place of “[encoding/json.Decoder]” to link to the docs for encoding/json's Decoder.

The implications and potential false positives of this implied URL link are presented by Joe Tsai here. In particular, the false positive rate appears to be low enough not to worry about.

To illustrate the need for the punctuation restriction, consider:

// Constant folding computes the exact constant value ([constant.Value])
// for every expression ([ast.Expr]) that is a compile-time constant.

versus

// The Types field, a map[ast.Expr]TypeAndValue,
// holds type-checking results for all AST expressions.
// A SHA1 hash is a [Size]byte.

Transition: Older versions of Go will show the text verbatim, which is still fairly readable.

Formatter

Along with the changes, I suggest we add to go/doc a function

func Format(text string) string

that reformats a doc comment in the conventional presentation, and then to have go/printer and gofmt invoke this formatter.

The formatter would canonicalize the input so that it formatted exactly as before but with the following properties:

  • All paragraphs are separated by single blank lines.
  • Legacy headings are converted to “#” headings.
  • All code blocks are indented by a single tab.
  • All code blocks have a blank line before and after.
  • All list markers are written as “␠␠-␠“ (space space dash space) or “␠N.␠“ (space number dot space).
  • All list item continuation text, including additional paragraphs, is indented by four spaces.
  • All lists have a blank line before and after.
  • If there is a blank line anywhere in a list, there are blank lines between all list elements.
  • All bullet list items use - as the bullet (* and + are converted).
  • All numbered list items use the “N.” form (“N)” is converted).
  • The ASCII double-single-quote forms that have always been
    defined to render as “ and ” are replaced with those.
  • Link URLs are moved to the end of the doc comment.

The formatter would not reflow paragraphs, so as not to prohibit use of the semantic linefeeds convention.

This canonical formatting has the benefit for Markdown aficionados of being compatible with the Markdown equivalents. The output would still not be exactly Markdown, since various punctuation would not be (and does not need to be) escaped, but the block structure Go doc comments and Markdown have in common would be rendered as valid Markdown.

Additional API

The current doc.ToHTML is given only the comment text and therefore cannot implement import-based links to other identifiers. To address this, we would need to add ToHTML and ToText methods to the Package type, and define that the top-level functions are as though calling the methods on a zero value of the struct. The ToHTML method will need to take a new Config struct that, at the least, allows specifying the URL prefix of the documentation server (for example, / or https://pkg.go.dev/ or https://golang.org/pkg/). It would also need to specify the HTML tag for headings, which will depend on the surrounding page where the docs will be presented.

There is an accepted proposal to add doc.ToMarkdown for easy conversion of Go doc comments to Markdown, and we would implement and update that as part of this work. It too would be added to the Package type.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK