Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for check_invalid_html_tags #67799

Closed
5 tasks done
Manishearth opened this issue Jan 2, 2020 · 48 comments
Closed
5 tasks done

Tracking issue for check_invalid_html_tags #67799

Manishearth opened this issue Jan 2, 2020 · 48 comments
Labels
A-lint Area: Lints (warnings about flaws in source code) such as unused_mut. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. needs-fcp This change is insta-stable, so needs a completed FCP to proceed. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.

Comments

@Manishearth
Copy link
Member

Manishearth commented Jan 2, 2020

This is a tracking issue for the rustdoc lint check_invalid_html_tags.
There is no feature gate for this lint, but it is currently nightly-only.

check_invalid_html_tags

This lint warns about unclosed HTML tags and comments. See https://doc.rust-lang.org/nightly/rustdoc/lints.html#invalid_html_tags for more information.

About tracking issues

Tracking issues are used to record the overall progress of implementation.
They are also used as hubs connecting to other relevant issues, e.g., bugs or open design questions.
A tracking issue is however not meant for large scale discussion, questions, or bug reports about a feature.
Instead, open a dedicated issue for the specific matter and add the relevant feature gate label.

Steps

Implementation history

Original issue

Rustdoc currently allows HTML in markdown, which is fine. The problem is that it doesn't sanitize the HTML at all, which means you can do wacky stuff like:

/// This is basically a Vec<Script>
pub enum ScriptExtension { /* ... */ }


impl ScriptExtension {
   /// docs
   pub fn some_func() {}
}

in which the Vec<Script> parses as an opening script tag, and the rest of the docs just ... disappear

image

This isn't great, and it's kind of surprising. Perhaps we should sanitize these, or at least warn when it happens? This may not be easy or cheap, though.

cc @GuillaumeGomez @kinnison

@Manishearth Manishearth added the T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue. label Jan 2, 2020
@steveklabnik
Copy link
Member

Historically, the decision was to do nothing. That doesn't mean that maybe changing it isn't the right choice, but this feels like internals thread / RFC territorry rather than a bug, personally.

@Manishearth Manishearth changed the title Should rustdoc sanitize unclosed HTML tags Should rustdoc sanitize unclosed HTML tags? Jan 2, 2020
@Manishearth
Copy link
Member Author

@steveklabnik my understanding is that the historical discussion has been mostly about whether rustdoc should allow arbitrary HTML at all, whereas this is about a narrower set of behavior -- allowing unclosed tags (which basically will never do what you want, and are often the result of mistyped generics).

There are really only a few unclosed tags that can cause problems (namely, script, video, object, audio, and some other similar ones), a simpler check would be one that just scans for <script with no corresponding </script> or something

@GuillaumeGomez
Copy link
Member

I'd personally do nothing. From my point of view, this is up to users to master what they're doing when using HTML since we're following markdown commonmark spec.

@Manishearth
Copy link
Member Author

Again, the problem is not users "using HTML", it's users accidentally introducing HTML into their code. I am okay with users intentionally using HTML in rustdoc. I know that has been a common complaint, but I am not making that complaint. I'm talking about something different.

Typing Vec<Script> without backticks is enough to break the docs. I've seen plenty of rustdocs that avoid religiously putting backticks on code, and I've made that mistake myself a bunch of times. Unclosed tags are an easy metric for this.

@GuillaumeGomez
Copy link
Member

The problem is that a lot of HTML5 tags can be unclosed. Keeping this list up-to-date might be tricky. We can do it but in case a user is using an unclosed HTML5 tag, what should we do in this case? If it's a type we're supposed to not convert it to HTML but what if it's intended to be HTML? This is where I have an issue.

@Manishearth
Copy link
Member Author

I also covered that in a previous comment. We can do this for specific tags only, as a start perhaps only the tags that will make the following content completely disappear (script, object, audio, video, etc).

We can also keep a list of all closeable HTML tags. If it's out of date, it's no big deal, this can just be a lint.

@kornelski
Copy link
Contributor

Yes, it should. This is not the same as forbidding HTML or sanitizing what HTML features are allowed.

Unclosed tags can break the entire page, which is highly undesirable.

It might be used intentionally to open a tag in one doc comment and close it in another, but such uses would couple docs very tightly to a particular ordering of items and specific markup used by rustdoc itself. Allowing to rely on such a hacky thing seems like a bad idea.

Rustdoc could parse and reserialize the markup. It would guarantee the markup is syntactically correct without changing its meaning.

@GuillaumeGomez
Copy link
Member

If we keep this as a warning, that could be a big plus. Does it sound OK?

@Manishearth: Do you mind listing the tags which need to be checked please?

@Manishearth
Copy link
Member Author

Yes, I already suggested a warning, I'm fine with that.

script, object, video, audio are the most problematic. We can get an implementation in and figure out the full list later, it should be something that can be found in the HTML spec somewhere.

@GuillaumeGomez GuillaumeGomez self-assigned this Jan 8, 2020
@GuillaumeGomez
Copy link
Member

Ok, I'll try to send something in as soon as possible.

@jyn514
Copy link
Member

jyn514 commented Sep 22, 2020

Mentoring instructions: In src/librustdoc/passes add a new pass (maybe named lint_unclosed_tags). For an example of a pass that looks at doc-comments, see unindent_comments. For an example of a pass that lints, see doc_test_lints.

The pass should check for the following unclosed tags:

  • script
  • video
  • object
  • audio

As an extension, you could also look for other unclosed tags; any tag that isn't self-closing should warn here.

You can see the documentation by calling item.attrs.doc_strings.as_str().

For an initial implementation, you don't have to worry about <scri\npt> across newlines. However, you should definitely allow multi-line tags, such as

/// <script>
/// </script>

@jyn514 jyn514 added E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. labels Sep 22, 2020
@jyn514
Copy link
Member

jyn514 commented Sep 22, 2020

Unassigning @GuillaumeGomez since it's been 9 months.

@camelid
Copy link
Member

camelid commented Sep 25, 2020

Reassigning @GuillaumeGomez since he's working on it :)

bors added a commit to rust-lang-ci/rust that referenced this issue Oct 7, 2020
…, r=jyn514

Unclosed html tag lint

Part of rust-lang#67799.

I think `@ollie27` will be interested (`@Manishearth` too since they opened the issue ;) ).

r? `@jyn514`
@jyn514 jyn514 changed the title Should rustdoc sanitize unclosed HTML tags? Tracking issue for check_invalid_html_tags Oct 9, 2020
@jyn514
Copy link
Member

jyn514 commented Oct 9, 2020

I repurposed this to be a tracking issue for the lint.

@jyn514
Copy link
Member

jyn514 commented Oct 9, 2020

@rust-lang/rustdoc after #77753 lands, what do you think about making this warn by default? I think it would be really helpful for things like tokio-rs/tracing#881.

@jyn514 jyn514 removed E-medium Call for participation: Medium difficulty. Experience needed to fix: Intermediate. E-mentor Call for participation: This issue has a mentor. Use #t-compiler/help on Zulip for discussion. labels Oct 9, 2020
@jyn514
Copy link
Member

jyn514 commented Sep 12, 2022

I think this should go through FCP (not just a team ping).

@jyn514 jyn514 added the needs-fcp This change is insta-stable, so needs a completed FCP to proceed. label Sep 12, 2022
@GuillaumeGomez
Copy link
Member

I'll start the FCP once #101720 is merged then.

@Nemo157
Copy link
Member

Nemo157 commented Sep 12, 2022

It doesn't seem like it should be enabled by default until stabilized. Otherwise users that compile on nightly will start receiving the warning already (since it doesn't appear to be feature-gated, and instead just works already when you use the nightly compiler).

@jyn514
Copy link
Member

jyn514 commented Sep 12, 2022

It doesn't seem like it should be enabled by default until stabilized. Otherwise users that compile on nightly will start receiving the warning already (since it doesn't appear to be feature-gated, and instead just works already when you use the nightly compiler).

BTW, I don't think this is the correct way to make things unstable - either they should need a feature gate, or they should be enabled for anyone, the defaults shouldn't be different between nightly and stable. In general, I don't think I've seen the compiler or other tools make lints unstable, only allow-by-default.

@Nemo157
Copy link
Member

Nemo157 commented Sep 12, 2022

There have been unstable lints in the past, I was trying to find one to test the behavior of but the only one I could think of was stabilized already (#79208).

@GuillaumeGomez
Copy link
Member

It doesn't seem like it should be enabled by default until stabilized. Otherwise users that compile on nightly will start receiving the warning already (since it doesn't appear to be feature-gated, and instead just works already when you use the nightly compiler).

How are we supposed to handle this stabilization then? First we stabilize and then we change it as a warn by default? Seems to be kinda the same in the end no?

@Nemo157
Copy link
Member

Nemo157 commented Sep 12, 2022

I would stabilize and change the default level in the same PR that goes through the FCP process.

@GuillaumeGomez
Copy link
Member

Then let's start the FCP process to hear everyone's opinion about this. I think it should be at warn level by default before the stabilization because after would finally be the same problem as the new warning would appear from nowhere for users.

Nominating for discussion. Also kicking off an FCP: shall we make it a warning before or after stabilization?

@rfcbot merge

@rfcbot
Copy link

rfcbot commented Sep 12, 2022

Team member @GuillaumeGomez has proposed to merge this. The next step is review by the rest of the tagged team members:

No concerns currently listed.

Once a majority of reviewers approve (and at most 2 approvals are outstanding), this will enter its final comment period. If you spot a major issue that hasn't been raised at any point in this process, please speak up!

See this document for info about what commands tagged team members can give me.

@rfcbot rfcbot added proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Sep 12, 2022
@GuillaumeGomez
Copy link
Member

Just realized I should put it on the PR directly and not here...

@rfcbot cancel

@rfcbot
Copy link

rfcbot commented Sep 12, 2022

@GuillaumeGomez proposal cancelled.

@rfcbot rfcbot removed proposed-final-comment-period Proposed to merge/close by relevant subteam, see T-<team> label. Will enter FCP once signed off. disposition-merge This issue / PR is in PFCP or FCP with a disposition to merge it. labels Sep 12, 2022
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Oct 8, 2022
…AGS, r=notriddle

Change default level of INVALID_HTML_TAGS to warning and stabilize it

Fixes of rust-lang#67799.

cc `@Nemo157`
r? `@notriddle`
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Oct 8, 2022
…AGS, r=notriddle

Change default level of INVALID_HTML_TAGS to warning and stabilize it

Fixes of rust-lang#67799.

cc ``@Nemo157``
r? ``@notriddle``
Dylan-DPC added a commit to Dylan-DPC/rust that referenced this issue Oct 8, 2022
…AGS, r=notriddle

Change default level of INVALID_HTML_TAGS to warning and stabilize it

Fixes of rust-lang#67799.

cc ```@Nemo157```
r? ```@notriddle```
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Oct 8, 2022
…AGS, r=notriddle

Change default level of INVALID_HTML_TAGS to warning and stabilize it

Fixes of rust-lang#67799.

cc `@Nemo157`
r? `@notriddle`
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Oct 8, 2022
…AGS, r=notriddle

Change default level of INVALID_HTML_TAGS to warning and stabilize it

Fixes of rust-lang#67799.

cc ``@Nemo157``
r? ``@notriddle``
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Oct 8, 2022
…AGS, r=notriddle

Change default level of INVALID_HTML_TAGS to warning and stabilize it

Fixes of rust-lang#67799.

cc ```@Nemo157```
r? ```@notriddle```
bors added a commit to rust-lang-ci/rust that referenced this issue Oct 10, 2022
…S, r=notriddle

Change default level of INVALID_HTML_TAGS to warning and stabilize it

Fixes of rust-lang#67799.

cc `@Nemo157`
r? `@notriddle`
@GuillaumeGomez
Copy link
Member

Seems like this is done. Closing then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-lint Area: Lints (warnings about flaws in source code) such as unused_mut. C-tracking-issue Category: A tracking issue for an RFC or an unstable feature. needs-fcp This change is insta-stable, so needs a completed FCP to proceed. T-rustdoc Relevant to the rustdoc team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

9 participants