Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implicit race with frame loading vs pushState/hash change #1191

Closed
avidrissman opened this issue May 4, 2016 · 23 comments · Fixed by #6315
Closed

Implicit race with frame loading vs pushState/hash change #1191

avidrissman opened this issue May 4, 2016 · 23 comments · Fixed by #6315

Comments

@avidrissman
Copy link

Suppose we have this scenario:

  1. Be at a site A with a new iframe in it, still at the initial about:blank URL.
  2. A loads the iframe by setting the url attribute of the frame.
  3. A does a pushState or document.location.hash change to make a new history entry.
  4. The user returns to the original history entry via the back button.

From what I can tell, the spec considers steps 2 and 3 to be a race. pushState gets the URLs of the iframes from their Documents, but switching the iframe's Document is the very last step of an iframe load. Therefore, when the user returns to the original history entry in step 4, one of two things can happen. If the load from step 2 completed before the pushState snapshotted the state of the page in step 3, then the user will see the iframe loaded. Otherwise (and more likely), the pushState completed before the iframe load completed, and the navigation in step 4 will leave a blank iframe.

The race doesn't need to be this explicit; a simple page load can encounter this.

I bring this up because I made a change to Chromium that fixed a bug that was masking this issue. This caused major issues with several big Google apps. I bring up a bug in Gmail chat as an example, and quote from the repro:

  1. switch to another label (like "starred")
  2. open a second chat to someone
  3. switch to another label (like "drafts")
  4. open a third chat to someone
  5. hit back twice, then forward twice

It's obvious what's going on here: opening a chat opens iframes, but switching labels triggers a document.location.hash change. And if the chat iframe hasn't fully loaded, then the iframe's location is captured as about:blank and things break.

The obvious solution according to the spec is to wait until all the onload events fire for the loading iframes. The problems with that solution are 1) that means that every part of the page has to know about all loading iframes, to make sure that they are all loaded before moving on, and 2) if the pushState or hash change is user-initiated, the web page can't wait.

And finally, in discussions with web page authors, when I mention this race to them, it is a surprise. None are expecting this to be the case.

This is a problem today in Chromium because I removed a bug that was masking this race. What is surprising to me is Firefox did not have this bug (see my analysis of Firefox in #546), but Firefox does not appear to have this race. Perhaps they're capturing loads in progress? I'm familiar with Chromium navigation code, so I can state why we're hitting this issue, but I am not familiar with Firefox code.

tl;dr:
The fact that [load iframe; pushState] is a race is surprising to users and has no easy workarounds. Chromium is being bitten by this. What do other browsers do? Should we fix the spec so that this is not a race?

@avidrissman
Copy link
Author

@domenic As we discussed.

@domenic
Copy link
Member

domenic commented May 4, 2016

@bzbarsky @annevk would love your input here. It sounds likely that browsers do not follow the spec here---and that what browsers do is better for user experience. I'm just not sure what model the browsers actually use. (Do they serialize the "this iframe is loading this URL" state somehow?) So if we were to update the spec, I'm not sure exactly what the best thing to do would be.

@bzbarsky
Copy link
Contributor

bzbarsky commented May 4, 2016

I... don't think Gecko does anything weird here in terms of capturing loads or whatnot. I wrote two simple testcases at http://web.mit.edu/bzbarsky/www/testcases/navigation/pushstate-after-iframe-append-1.html and http://web.mit.edu/bzbarsky/www/testcases/navigation/pushstate-after-iframe-append-2.html and both show the back button going to the state before the iframe navigation.

Or is the idea that the pushState should be happening on the parent page instead of on the iframe? A testcase that actually shows the problem would be quite helpful here.

@avidrissman
Copy link
Author

I described the repro in four steps because it's a race, and my argument (whatever Gecko does) is that it's developer-hostile.

My point is that I fixed a bug in Chromium so that it actually behaves according to the spec, and it broke multiple large HTML apps that were behaving in reasonable ways. I think that illustrates a problem with the spec.

@bzbarsky
Copy link
Contributor

bzbarsky commented May 5, 2016

I understand your argument. I'm trying to figure out whether these apps are breaking in other browsers too, if not why not, and what the exact scenario you're talking about is.

So that's my first question: do my testcases capture the scenario you're talking about? That is, is pushState happening on the window that's being navigated, or on some other window?

@avidrissman
Copy link
Author

OK.

My version of the test page is http://avidrissman.github.io/htmltests/race.html , where the top-level page has the pushState.

@bzbarsky
Copy link
Contributor

bzbarsky commented May 5, 2016

OK. That's a quite different situation. It seems like it should be possible to make that work, while making my testcases "work" would be ... very hard at best.

@bzbarsky
Copy link
Contributor

bzbarsky commented May 5, 2016

So hmm. In that particular case, once the load and pushstate have both completed we have three history positions, right? The first one is before both the load and the pushstate. The second one is after either the load or the pushtate, whichever one completed first. The third one is after both the load and the pushstate have happened.

Clicking back from the third position will therefore either undo the pushstate or undo the subframe load. Clicking back a second time will go back to before the subframe loaded or the pushstate happened.

At least that's the thing that seems like it would happen with a naive implementation of session history; I have no cross-checked this against what the spec does.

What do you think the behavior should be in this situation?

@avidrissman
Copy link
Author

Having three history positions is what you get in Firefox :) In Chromium you only have two, as the initial load of an iframe elides the "about:blank" state. See #546 for that discussion.

I'm comparing it to http://avidrissman.github.io/htmltests/race2.html where the iframe isn't loaded at the moment of the pushState, but it seems that Chromium's state snapshot includes the loaded frame even though the load wasn't complete.

What do you think the behavior should be in this situation?

My opinion is informed by what I see web authors doing and expecting to work (and filing bugs against me when they don't). In the first bug (Google-only, sorry), the URL was of the form "google.com/some-special-state-url" which the page's JavaScript rewrote into "google.com/baseservice" and then did a pushState to "google.com/baseservice/state". It assumed that it could load a bunch of iframes before the pushState, but then when the user clicked the back button the frames were reverted to being "about:blank" and nothing worked.

In the second bug Gmail is trying to use location.hash changes to maintain user-friendly URLs, but the fact that the user opens chat moles causes it to start loading iframes. Again, they're assuming that they can do a sequence of actions of [load iframe, pushState] and have the load of the iframe be captured by the pushState.

So if I had to pick a desired behavior, it would be if an iframe load is started before the pushState, that it would be captured in the state of the previous history item. I haven't any idea how to accomplish that, or even if that would break more websites than it would fix. 😬

@bzbarsky
Copy link
Contributor

bzbarsky commented May 5, 2016

as the initial load of an iframe elides the "about:blank" state

This is with your recent changes? Because my experiments in #1154 (note in particular #1154 (comment) and #1154 (comment)) shows something different in Chrome, at least in terms of window reuse; not sure about session history....

This stuff is all so broken. :(

@avidrissman
Copy link
Author

as the initial load of an iframe elides the "about:blank" state

This is with your recent changes?

No, this has been true forever in Chromium. Open http://avidrissman.github.io/htmltests/aboutblank.html in Chromium and click the "Load the frame" button. I mean session history; I don't know what you mean by "window reuse".

(My change was the one I mentioned in a comment on 546, which involves restoring frames; it didn't change any elision.)

This stuff is all so broken. :(

If this were perfectly functioning, we wouldn't have jobs! Cheer up! 😂

MXEBot pushed a commit to mirror/chromium that referenced this issue May 7, 2016
This reverts r371031.

This reverts a genuine bugfix, but the bug is needed to mask a much deeper and severe problem. Discussion is ongoing there.

BUG=598043, 542299, 600534, whatwg/html#1191
TEST=disabling
CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_site_isolation

Review-Url: https://codereview.chromium.org/1949213002
Cr-Commit-Position: refs/heads/master@{#391906}
(cherry picked from commit b48cb31)

Review URL: https://codereview.chromium.org/1954213003 .

Cr-Commit-Position: refs/branch-heads/2661@{#662}
Cr-Branched-From: ef6f6ae-refs/heads/master@{#378081}
MXEBot pushed a commit to mirror/chromium that referenced this issue May 11, 2016
This reverts r371031.

This reverts a genuine bugfix, but the bug is needed to mask a much deeper and severe problem. Discussion is ongoing there.

BUG=598043, 542299, 600534, whatwg/html#1191
TEST=disabling
CQ_INCLUDE_TRYBOTS=tryserver.chromium.linux:linux_site_isolation

Review-Url: https://codereview.chromium.org/1949213002
Cr-Commit-Position: refs/heads/master@{#391906}
(cherry picked from commit b48cb31)

Review URL: https://codereview.chromium.org/1965003002 .

Cr-Commit-Position: refs/branch-heads/2704@{#479}
Cr-Branched-From: 6e53600-refs/heads/master@{#386251}
@avidrissman
Copy link
Author

@bzbarsky

I'm trying to work out whether Firefox is susceptible to the race, but I'm hitting the same weirdness that I saw in #546.

Go to http://avidrissman.github.io/htmltests/race3.html and click the "Start the race" button. It will start loading the frame, and then will do a pushState. Then the load to the frame will commit, and we'll get a third history entry.

Our history list will look like:

[1] Blank iframe
[2] Blank iframe, new history entry caused by pushState
[3] Filled iframe, new history entry caused by navigation of the iframe

That's fine. Now single-step back from [3] to [1] using the back arrow. The iframe gets blanked out on the move from [3] to [2] and stays blank when we move to [1].

Now single-step back from [1] to [3] using the forward arrow. The iframe fills on the move from [2] to [3], which is expected.

But we're now on [3]. Click and hold the back arrow to jump to [1] with a single navigation. The iframe stays filled. This is inconsistent with the behavior we've seen so far.

So it seems like Firefox is obeying the standard here by snapshotting a blank frame on the move from [1] to [2], but the weirdness might be masking any issues that come up because of it.

@bzbarsky
Copy link
Contributor

I'm entirely willing to believe there is weirdness. Note that the actual implementation of history in Gecko doesn't look much like the spec at all, so there can be all sorts of weird mismatches in behavior...

@annevk
Copy link
Member

annevk commented Jun 21, 2016

@avidrissman "window reuse" refers to the fact whether or not the global object remains identical (it's further explained in the references @bzbarsky made).

@jakearchibald
Copy link
Contributor

Sorry to raise this from the dead, but I'm currently looking at session history stuff.

@avidrissman

  • Be at a site A with a new iframe in it, still at the initial about:blank URL.
  • A loads the iframe by setting the url attribute of the frame.
  • A does a pushState or document.location.hash change to make a new history entry.
  • The user returns to the original history entry via the back button.

My understanding is that changing an iframe from about:blank to anything else will force the navigation to be a 'replace', so the race is avoided since it doesn't create an additional history entry.

Maybe that change happened since 2016.

Even if the iframe wasn't on about:blank, if the iframe url is set in the same task as the pushState call, the pushState call should always win, since it's sync.

@avidrissman
Copy link
Author

I’m not up to speed enough on the spec to disagree. All I know is that whenever the pushState won the race, and captured a blank frame, that broke Hangouts-in-Gmail, and that wasn’t something I could do.

If you want to take this on, please read the big comment in Chromium’s NavigationControllerImpl::FindFramesToNavigate(). I’m happy to consult, but this theoretically correct behavior is something that AFAICT web authors don’t necessarily expect, and a deeper look into how to handle this is needed (hence this bug).

@jakearchibald
Copy link
Contributor

Did the steps in your OP result in 3 history entries at the time?

  1. Initial page & about:blank iframe
  2. New iframe URL
  3. Top-level pushState / hash change.

Because now it only seems to create two, as the iframe's initial history entry is replaced.

@avidrissman
Copy link
Author

I cannot say; my notes don’t indicate, and this was nearly five years ago and my recollection isn’t that good. Again, you’re welcome to investigate, and I’m happy to help to the extent that I’m able.

@jakearchibald
Copy link
Contributor

No problem! Thanks

@domenic
Copy link
Member

domenic commented May 19, 2021

@jakearchibald, @natechapin and I were looking at this today, and at least I got confused. It seems like according to comments like #1191 (comment) Chromium has always had about:blank replacement behavior in this case, with a total of only two history entries. (While Gecko still doesn't, if you test http://avidrissman.github.io/htmltests/race.html !) So I'm not sure about:blank replacement saves us.

I think this instead might have to do with the iframe restoration that happens when navigating back. (Which you are not yet tackling in the session history rewrite.) The OP mentions

If the load from step 2 completed before the pushState snapshotted the state of the page in step 3, then the user will see the iframe loaded. Otherwise (and more likely), the pushState completed before the iframe load completed, and the navigation in step 4 will leave a blank iframe.

i.e. I think the idea is that step 1 will either contain a snapshot saying iframe URL to restore = about:blank, or it will contain a snapshot saying iframe URL to restore = some real URL.

@natechapin's suggestion for fixing this, while also allowing us to fix the NavigationControllerImpl::FindFramesToNavigate() issue, is that if any frames have a navigation in progress at the time we snapshot their "restore URLs", we should use the destination URL, instead of the current URL. And this seems to match what Firefox does. So we might try that...

@jakearchibald
Copy link
Contributor

jakearchibald commented May 25, 2021

I think this instead might have to do with the iframe restoration that happens when navigating back. (Which you are not yet tackling in the session history rewrite.)

I don't think so, but I could be misinterpreting the issue. iframe restoration only comes into play when navigating in a way that destroys the iframes, like a cross document navigation, but that doesn't seem to be involved in the OP.

Here's my interpretation:

Screenshot 2021-05-25 at 10 30 05

We start with our page and our iframe. We start navigating the iframe, but in the meantime, we hash-navigate the top level:

Screenshot 2021-05-25 at 10 31 25

Then, the fetch for the iframe completes, so we have a new history entry read to go, but since the current history item in that navigable is about:blank, we do a replacement:

Screenshot 2021-05-25 at 10 31 47

I get the same result even if the iframe wins the race, so there's something about my mental model that differs from Chrome's implementation, and I'm not sure what it is.

It would be a race if it was "add" rather than "replace", as the history entries would have different steps, but I don't see the race with "replace".

@domenic
Copy link
Member

domenic commented May 25, 2021

@natechapin explained the problem to me. In Chrome's current implementation, the replacement splits the iframe's history entry into two, one for step 1 (which remains on about:blank) and one for step 2 (which gets updated to //yo).

The band-aid fix that Chrome currently implements is then that navigating back only navigates the parent frame, leaving the iframe on //yo.

The correct fix which @natechapin is working on in https://chromium-review.googlesource.com/c/chromium/src/+/2910529 is to properly share the history entries between step 1 and step 2 for the iframe, updating both steps (as a single entry) to //yo, instead of letting the replacement split them into about:blank + //yo.

@jakearchibald
Copy link
Contributor

Ah cool! So it feels like what I'm specing is in the right direction, as this "sharing" is part of the data model. There's no way that I'm aware of that the iframe could end up with a step 2 entry while retaining the top level step 2 entry.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

Successfully merging a pull request may close this issue.

5 participants