Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace GROWABLE_HEAPS with native functions #18589

Open
arsnyder16 opened this issue Jan 24, 2023 · 11 comments
Open

Replace GROWABLE_HEAPS with native functions #18589

arsnyder16 opened this issue Jan 24, 2023 · 11 comments

Comments

@arsnyder16
Copy link
Contributor

I was playing around with your benchmark, and i think i came up with an option to reduce the overhead of warning: USE_PTHREADS + ALLOW_MEMORY_GROWTH may run non-wasm code slowly

Instead of always relying on the array views, have a native function to call into which can cast the memory address, since this is on the wasm side no need to check if the underlying buffer has changed.

Here is the new benchmark.cpp
#include <iostream>
#include <string>
#include <chrono>
#include <emscripten.h>

extern "C" {
EMSCRIPTEN_KEEPALIVE uint8_t view_uint8(void* address) { return reinterpret_cast<uint8_t*>(address)[0]; }
}

int main() {
  auto start = std::chrono::steady_clock::now();
  size_t total = 0;
    for (const char* str :  { "hello", "world", "!", "test", "ing", "strings", "many", "of", "them" }) {
      total += EM_ASM_INT({
          var total = 0;
          var value;
          for (var i = 0; i < 1000000; ++i) {
              value = AsciiToString($0);
            total += value.length;
          }
          console.log(value);
          return total;
        }, str);
    }
  auto ms = std::chrono::duration_cast<std::chrono::milliseconds>(std::chrono::steady_clock::now() - start).count() ;
  std::cout << total << " in " <<  ms << " ms " <<  total/ms << " /ms \n";
}
NO_ALLOW_MEMORY_GROWTH
~/emscripten/benchmark ((3.1.7))# em++ -Os -pthread -o -sNO_ALLOW_MEMORY_GROWTH -o benchmark.js benchmark.cpp 
~/emscripten/benchmark ((3.1.7))# node --experimental-wasm-threads --experimental-wasm-bulk-memory benchmark.js
hello
world
!
test
ing
strings
many
of
them
35000000 in 293 ms 119453 /ms 
ALLOW_MEMORY_GROWTH
~/emscripten/benchmark ((3.1.7))# em++ -Os -pthread -sALLOW_MEMORY_GROWTH -o benchmark.js benchmark.cpp 
em++: warning: USE_PTHREADS + ALLOW_MEMORY_GROWTH may run non-wasm code slowly, see https:/WebAssembly/design/issues/1271 [-Wpthreads-mem-growth]
~/emscripten/benchmark ((3.1.7))# node --experimental-wasm-threads --experimental-wasm-bulk-memory benchmark.js
hello
world
!
test
ing
strings
many
of
them
35000000 in 1328 ms 26355 /ms 

Swapping out AsciiToString with:

function AsciiToString(ptr){
  var str="";
  while(1) {
    var ch=Module['_view_uint8'](ptr++>>0) >>> 0;
    if(!ch) return str;
    str+=String.fromCharCode(ch)
  }
}
New Results!
~/emscripten/benchmark ((3.1.7))# node --experimental-wasm-threads --experimental-wasm-bulk-memory benchmark.js
hello
world
!
test
ing
strings
many
of
them
35000000 in 408 ms 85784 /ms 
@arsnyder16
Copy link
Contributor Author

@kripken Just was looking at this again. My benchmark is less accurate to latest emscripten since my examples uses AsciiToString which is a LEGACY_RUNTIME function and not emitted by default on 3.1.21+.

I think the same principal applies though.

Interestingly i noticed in the new runtime there is a function UTF8ArrayToString which interesting is called like:

function UTF8ToString(ptr, maxBytesToRead) {
 return ptr ? UTF8ArrayToString(GROWABLE_HEAP_U8(), ptr, maxBytesToRead) : "";
}

Isn't the whole idea area GROWABLE_HEAP_ functions is for thread safety? The case would be if the wasm memory grows the array views are stale? Doesn't that mean this function UTF8ArrayToString contains this race condition?

@arsnyder16
Copy link
Contributor Author

WebAssembly/design#1271

@kripken
Copy link
Member

kripken commented Jan 24, 2023

I'm not sure what you mean by a race condition in UTF8ToString? I think that code is valid, and is not special unless I'm missing something. That is, the overall issue is that

  1. JS is passed a C pointer to memory.
  2. That memory was only recently grown.
  3. JS has not yet updated its memory views for a size big enough to see that C pointer.

Having GROWABLE_HEAP_U8() in there will update the view if necessary.

@arsnyder16
Copy link
Contributor Author

I see. maybe i misunderstood to exact reason why GROWABLE_HEAP was necessary, I thought the other issue was related to the underlying buffer being relocated which could make the view's ArrayBuffer be freed.

So my concern with UTF8ToString was related to the usage of the view inside the function

while (heapOrArray[endPtr] && !(endPtr >= endIdx)) ++endPtr;

my thought was during those iterations buffer could be invalidated.

If that isn't actually a concern then the original performance issue in AsciiToString could have been fixed much easier by just mimicking how UTF8ToString works now.

It looks like:

function AsciiToString(ptr) {
  var str = '';
  while (1) {
    var ch = GROWABLE_HEAP_U8()[((ptr++)>>0)];
    if (!ch) return str;
    str += String.fromCharCode(ch);
  }
}

Where the main overhead here was the function call to GROWABLE_HEAP_U8() on each iteration of the loop which makes this comparison if (wasmMemory.buffer != buffer)
If GROWABLE_HEAP_U8 doesn't need called on each iteration of the loop, as UTF8ToString does then the solution would have been more along the lines of

function AsciiToString(ptr) {
  var str = '';
  var heap =  GROWABLE_HEAP_U8();
  while (1) {
    var ch = heap[((ptr++)>>0)];
    if (!ch) return str;
    str += String.fromCharCode(ch);
  }
}

With that said if there are still places invoking these "GROWABLE" functions in a "tight" loop then they could simply be pulled out of the loop.

If for some reason they cannot be removed from the loop then deferring to a native function to resolve the value at the address does appear to be faster than if (wasmMemory.buffer != buffer)

@kripken
Copy link
Member

kripken commented Jan 25, 2023

my thought was during those iterations buffer could be invalidated.

Ok, then that is not a problem. If we got a pointer to a string, we can assume the string has been allocated before us. So we just need to update the view once. I would hope the JS VM will hoist such code out automatically, but if we see benefits on manually hoisting it then that can be useful.

(In theory a multithreaded application could also have a bad data race in which the string is overwritten or resized, but that would be an error even without memory growth.)

@arsnyder16
Copy link
Contributor Author

OK then we can probably close this since it appears the only code that has these issues is in the "legacy" functions.

I can rerun my benchmarks if you would like to show the difference between manual hoisting and now. In my experience hoping that a VM or compiler etc will automatically fix performance problems is typically a pipe dream. Especially if you know a manual solution solves the problem

@kripken
Copy link
Member

kripken commented Jan 25, 2023

Yeah, in this case I think you're right, I was also talking with @sbc100 offline and it makes sense this can't be hoisted - the VM can't tell that the allocation has already been done, and so since we iterate up, we might race with the growth of data.

In general though I am hoping that spec and VM improvements will make this problem go away by itself. I'm not opposed to improvements in the meanwhile, though, if they are helpful.

@sbc100
Copy link
Collaborator

sbc100 commented Jan 25, 2023

On possible improvement would be to update the current acorn pass such that the check is one exactly once per-js-function (or at least per-js-function-that contains a heap access), rather than on every access.

It is conceivable that one could write multi-threaded app that get a new pointer from a background thread during a JS function.. but its very very unlikely anyone would do that.

@arsnyder16
Copy link
Contributor Author

@sbc100 That seems like a reasonable idea, based on a quick scan of something built with a recent version of emscripten without the legacy function there are only a few places that do call GROWABLE_HEAP multiple times and they don't seem like something that would be overly expensive. Cleaning them up though and making sure any new ones don't turn into gotchas would be an improvement

@mehagar
Copy link

mehagar commented Sep 11, 2023

Would the Growable Array Buffer proposal solve this: https:/tc39/proposal-resizablearraybuffer#sync-up-capability-with-webassembly-memorygrow (coupled with the corresponding WebAssembly proposal: WebAssembly/spec#1292 )

If I understand that proposal, if the ArrayBuffers that WebAssembly memory uses can grow in place, there is no need to worry about having to refresh views in JS.

@sbc100
Copy link
Collaborator

sbc100 commented Sep 11, 2023

Yes, once we have growable array buffers there would be no need for any of this. However, it might be a while before folks want to drop support for engines without growable array buffers.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants