Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix TextDecoder exception on decoding SharedArrayBuffer #16994

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

jiulongw
Copy link
Contributor

Fixes #15217

This patch is used in our product to fix a problem described in #15217.

I haven't run a full browser compatibility tests other than Chrome. But I see this pattern (type checking based on Symbol.toStringTag) already being used so I guess it is acceptable.

As for performance, benchmark shows Object.prototype.toString.call is slightly faster than instanceof.

return `${heap}.buffer instanceof SharedArrayBuffer ? ${shared} : ${unshared}`;
// In some cases, `instanceof SharedArrayBuffer` returns false even though buffer is an SAB.
// See: https:/emscripten-core/emscripten/issues/15217
return `Object.prototype.toString.call(${heap}.buffer) === "[object SharedArrayBuffer]" ? ${shared} : ${unshared}`;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems reasonable to me. Thanks for linking the bug. I wonder if we could add a test to test_browser.py?

Copy link
Collaborator

@juj juj Sep 1, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reads madness to me. 😅

The iframe gotcha with instanceof is something that I learned new now, and it makes sense, however that raises a few questions:

  1. instanceof fails if x instanceof y is called with x and y that split iframe domains. In this case, why are ${heap}.buffer and SharedArrayBuffer from two different iframe domains? This code should be all emitted inside the same JS domain, so the values should be the same. (obviously that is not the case since people are hitting this issue, but we should look at a more detailed repro first)
  2. I am suspicious whether this call to .toString() would be standardized to require returning a string in that format? Is that confirmed? Or does it just happen to return the same value on current environments we care about?
  3. We have 75 other uses of instanceof under JS files in src/. Which uses of instanceof should we ban there too?
  4. The reason for the earlier fast type check was that opportunistically if the conditions match, we can avoid duplicating the string. However, this new check looks really slow, as it will need to do a string compare operation. Most strings that are converted are typically relatively short, so it is likely that just unconditionally performing the copy-to-unshared view operation will be much faster.

I think before we land a fix like this, we should take a look at point 1 first: why is it possible that we get values from splitting domains here when people are building into iframes? Is such data flow usage sensible, or is that something that should be avoided?

And if it is sensible, then look at the point 4 second: should we just unconditionally return unshared; there for better performance - the opportunistic "avoid-a-copy" optimization should not be turned into a slowdown.

And also if this kind of change is sensible, we do then have those 75 other call sites that can be suspect as well - can data in them split iframe domains?

I think we really should place a test for this, one that we accept that the use case is sound. If there are other places that are needed to be fixed for in-iframe execution, then we might even look if that would warrant a compile time flag -sENVIRONMENT=web,worker,...,iframe?

@jiulongw
Copy link
Contributor Author

Update: It seems this bug only affects Google Chrome when WASM is loaded by localhost. Not a high priority issue.

@mattmone
Copy link

Update: It seems this bug only affects Google Chrome when WASM is loaded by localhost. Not a high priority issue.

I have experienced this not on localhost and it breaks some future functionality of our app quite consistently

@tthef
Copy link

tthef commented Sep 1, 2022

This issue is not limited to localhost.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 1, 2022

This issue is not limited to localhost.

Can you describe how/when you are able to reproduce the issue? Any idea what set of circumstances cause it?

@sbc100 sbc100 requested a review from juj September 1, 2022 08:16
Copy link
Collaborator

@sbc100 sbc100 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding @juj who wrote this code originally I believe.

@tthef
Copy link

tthef commented Sep 1, 2022

This issue is not limited to localhost.

Can you describe how/when you are able to reproduce the issue? Any idea what set of circumstances cause it?

Use of embind and iframes, as per the original description in the linked issue #15217.

@jiulongw
Copy link
Contributor Author

This issue is not limited to localhost.

Can you describe how/when you are able to reproduce the issue? Any idea what set of circumstances cause it?

Use of embind and iframes, as per the original description in the linked issue #15217.

Right. This not only happen in localhost, but in our production environment as well, in latest Chrome (m109). In our case, it happens when a tab is being duplicated. Using -s TEXTDECODER=0 can work around this problem with a bit speed penalty, as mentioned in #18034.

@tthef
Copy link

tthef commented Jan 18, 2023

We have been patching our toolchain with this PR's patch for some time now, works well.

@kripken
Copy link
Member

kripken commented Jan 18, 2023

If this is useful for people then we should think about landing it. But first I'd like to know if this is a workaround for a current bug or if it is known behavior. A link to a chrome/firefox/safari issue would be useful, or to some other relevant docs on MDN or such. Or a testcase so that we can file an issue.

@tthef
Copy link

tthef commented Jan 19, 2023

If this is useful for people then we should think about landing it. But first I'd like to know if this is a workaround for a current bug or if it is known behavior. A link to a chrome/firefox/safari issue would be useful, or to some other relevant docs on MDN or such. Or a testcase so that we can file an issue.

It's not a workaround, it's a bug in emscripten due to incorrect assumptions about the instanceOf operator, all the relevant information, including a minimal code sample, is in the original ticket, #15217.

@juj
Copy link
Collaborator

juj commented Jan 19, 2023

It's not a workaround, it's a bug in emscripten

One person's bug is another person's unsupported use case.

I agree we should add support for these types of scenarios. This PR is still pending addressing the review points 1. - 4. in #16994 (comment) , which to summarize has the following concerns:

  • we don't have testing for this use case, so it is clear that this use case has gone unnoticed,
  • there are more instances of "incorrect assumptions" in Emscripten that also need to be modified for splitting iframes like this to work - or to be reasoned that those conditions cannot occur anywhere else? (if so, why not?) In the absence, this PR seems partial to fix one code flow path in one program, not to add general support for this kind of split iframe use case, (TextDecoder sometimes fails when used through an iframe #15217 (comment) suggests that there are other call sites as well that need tending to)
  • The standardization over .toString() format is still a question: in the past I have seen .toString() calls return different syntax on some types in different browsers/shells. Would e.g. using x.constructor.name serve the same purpose, which is guaranteed to have a unified result?

Ultimately I think we should add this as a compile-time feature and do something like {{{ isInstanceOf(x, y) }}} which would get conditionally compiled to a fast instanceOf() call for people who are building binaries that don't split iframe domains, and then the more general check for people who opt in to it. (to apply zero-overhead principle)

For all that to happen, there would have to be someone who cares about this use case to provide such an improvement. In the current form this PR would regress users who don't utilize this kind of deployment, and there is no testing visibility to ensure that it wouldn't break in the future.

@arsnyder16
Copy link
Contributor

@sbc100 @juj @kripken Any update here I am encountering this as well when using localhost. Don't believe i have seen this in any production but its a pain point when trying to locally debug without needing to specific options (-s TEXTDECODER=0) just for local that aren't used in production

@sbc100
Copy link
Collaborator

sbc100 commented Mar 13, 2024

@sbc100 @juj @kripken Any update here I am encountering this as well when using localhost. Don't believe i have seen this in any production but its a pain point when trying to locally debug without needing to specific options (-s TEXTDECODER=0) just for local that aren't used in production

@arsnyder16 do you know what you are doing that triggers this bug when testing locally? I'm trying to figure out why we don't see this issue in any of our testing.

@arsnyder16
Copy link
Contributor

@sbc100 Nothing obvious, we have been using emscripten for a few years, and part of our CI process runs GTests through emrun that run quite a bit of code and i don't think we ever saw this in this environment. I just happened to be debugging one of our tests locally today and i started to hit this.

If i see a pattern or find explanation/theory i will certainly keep you updated.

@sbc100
Copy link
Collaborator

sbc100 commented Mar 13, 2024

@sbc100 Nothing obvious, we have been using emscripten for a few years, and part of our CI process runs GTests through emrun that run quite a bit of code and i don't think we ever saw this in this environment. I just happened to be debugging one of our tests locally today and i started to hit this.

If i see a pattern or find explanation/theory i will certainly keep you updated.

Can you share the full set of link flags you use?

@arsnyder16
Copy link
Contributor

Sure

-sINITIAL_MEMORY=100MB 
-sSTACK_SIZE=2MB 
-fexceptions 
-sWASM_BIGINT 
-sALLOW_MEMORY_GROWTH 
-sMALLOC=mimalloc 
-sEXIT_RUNTIME 
--post-js forward-env.js 
-pthread 
-sPROXY_TO_PTHREAD 
-Os 
--emrun 
-sENVIRONMENT=web,worker 
-sLZ4 
-sFETCH 
--extern-post-js bulk-console-print.js 
--preload-file=Resources@Resources 
--preload-file=Testing@Testing

I don't think they have any influence here, but the custom js we are adding with extern-post-js and post-js is to workaround a few issues here they are

forward-env.js

#16517

if(ENVIRONMENT_IS_NODE) {
  ENV = Object.create(process.env);
} else if(ENVIRONMENT_IS_WEB)  {
  ENV = Module['arguments'].reduce((prev, curr) => {
    if (curr.startsWith('--env=')) {
      const results = curr.split('=');
      prev[results[1]] = results[2];
    }
    return prev
  }, {
    "TZ": Intl.DateTimeFormat().resolvedOptions().timeZone // https:/emscripten-core/emscripten/pull/16517 
  });
}
bulk-console-print.js

#15839

// https:/emscripten-core/emscripten/issues/15839
if (!Module["ENVIRONMENT_IS_PTHREAD"] && (arguments_.includes('--gtest_list_tests') || !arguments_.includes('--no-console-buffer'))) {
   var flushs = [];
   var flushConsole = () => flushs.forEach(x => x());
   var flushInterval;
   var resetInterval = () => {
      clearInterval(flushInterval);
      flushInterval = setInterval(flushConsole, 60 * 1000);
   };
   resetInterval();
   var processConsoleLine = cb => {
      var lines = [];
      var limit = 1000;
      var flush = () => {
         resetInterval();
         if (!lines.length) {
            return;
         }
         cb(lines.join('\n'));
         lines = [];
      };
      flushs.push(flush);
      return (text) => {
         lines.push(text);
         lines.length > limit && flush();
      };
   }
   out = processConsoleLine(out);
   err = out;
   addOnExit(flushConsole);
}

@sbc100
Copy link
Collaborator

sbc100 commented Mar 13, 2024

Sure

-sINITIAL_MEMORY=100MB 
-sSTACK_SIZE=2MB 
-fexceptions 
-sWASM_BIGINT 
-sALLOW_MEMORY_GROWTH 
-sMALLOC=mimalloc 
-sEXIT_RUNTIME 
--post-js forward-env.js 
-pthread 
-sPROXY_TO_PTHREAD 
-Os 
--emrun 
-sENVIRONMENT=web,worker 
-sLZ4 
-sFETCH 
--extern-post-js bulk-console-print.js 
--preload-file=Resources@Resources 
--preload-file=Testing@Testing

I don't think they have any influence here, but the custom js we are adding with extern-post-js and post-js is to workaround a few issues here they are

forward-env.js
bulk-console-print.js

If there is some way you could reduce this something smaller, perhaps using the simple test/hello_world.c program that would be great. Or perhaps you could try to figure out which of those settings is triggering this issue?

What browser are you using BTW?

@arsnyder16
Copy link
Contributor

If there is some way you could reduce this something smaller, perhaps using the simple test/hello_world.c program that would be great. Or perhaps you could try to figure out which of those settings is triggering this issue?

What browser are you using BTW?

Chrome 122.

One random theory. Since we are using memory growth and pthreads. Is it possible that during a UTF8ArrayToString another thread is causing a wasmMemory.grow to happen ?

  • Can a grow call happen on a worker thread?
    • if so will other thread experience the instanceof issue?
    • What happens when that thread is destoryed?

I tested this theory very loosely by changing -sINITIAL_MEMORY=100MB to -sINITIAL_MEMORY=1500MB in my setup to try and avoid an memory growing, and i no longer see the problem. So it seems related

My guess is an example with multiple threads that are requiring more memory to be allocated and interleaved with some threads triggering UTF8ArrayToString may simulate the problem

pseudo related to dealing with GROWABLE_HEAP etc

@sbc100
Copy link
Collaborator

sbc100 commented Mar 13, 2024

If there is some way you could reduce this something smaller, perhaps using the simple test/hello_world.c program that would be great. Or perhaps you could try to figure out which of those settings is triggering this issue?
What browser are you using BTW?

Chrome 122.

One random theory. Since we are using memory growth and pthreads. Is it possible that during a UTF8ArrayToString another thread is causing a wasmMemory.grow to happen ?

  • Can a grow call happen on a worker thread?

    • if so will other thread experience the instanceof issue?
    • What happens when that thread is destoryed?

I tested this theory very loosely by changing -sINITIAL_MEMORY=100MB to -sINITIAL_MEMORY=1500MB in my setup to try and avoid an memory growing, and i no longer see the problem. So it seems related

My guess is an example with multiple threads that are requiring more memory to be allocated and interleaved with some threads triggering UTF8ArrayToString may simulate the problem

pseudo related to dealing with GROWABLE_HEAP etc

This seems like it might be an unrelated issue then.

Can you catch the exception in a debugger and see what ${heap}.buffer is at the point failure?

Can you confirm is this patch fixes your issue or not?

@arsnyder16
Copy link
Contributor

arsnyder16 commented Mar 13, 2024

I am certain this PR would fix the crash based on the output below from the browser console. An explanation of the exact behavior or reason instanceof stops working is still unclear but i think my theory of growing etc is related to producing the situation. It doesn't seem strictly related to iframes.

image
image

@kripken
Copy link
Member

kripken commented May 7, 2024

@arsnyder16

My guess is an example with multiple threads that are requiring more memory to be allocated and interleaved with some threads triggering UTF8ArrayToString may simulate the problem

I tried to test that theory but can't seem to hit the problem:

https://gist.github.com/kripken/170c8cac2660264e6563104a2fb3f63f

$ emcc a.cpp -pthread -O3 -sPROXY_TO_PTHREAD -sALLOW_MEMORY_GROWTH -sEXIT_RUNTIME --profiling -o a.html
$ emrun a.html

This seems to be a browser bug of some kind so it would really be good to get a reproducer so we can file an issue. Even if we want to land this workaround for now until it is fixed - which I am ok with in principle - it would be good to confirm the problem exists and is fixed by it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TextDecoder sometimes fails when used through an iframe
7 participants