diff --git a/posts/2024-08-26-webassembly-targets-and-new-on-by-default-features.md b/posts/2024-08-26-webassembly-targets-and-new-on-by-default-features.md new file mode 100644 index 000000000..38dcf955c --- /dev/null +++ b/posts/2024-08-26-webassembly-targets-and-new-on-by-default-features.md @@ -0,0 +1,191 @@ +--- +layout: post +title: "WebAssembly targets and new on-by-default features" +author: Alex Crichton +--- + +The Rust compiler has [recently upgraded to using LLVM 19][llvm19] and this +change accompanies some updates to WebAssembly targets of the Rust compiler. +Nightly Rust, what will be come Rust 1.82 on 2024-10-17, reflects all of these +changes and can be used for testing. + +WebAssembly is an evolving standard where new features are being added over time +through a [proposals process][proposals]. As WebAssembly proposals reach +maturity, get merged into the specification itself, get implemented in engines, +and remains this way for quite some time then producer toolchains (e.g. LLVM) +are going to update to include these new proposals by default. In LLVM 19 this +has happened with the [multi-value and reference-types proposals][llvmenable]. +These are now enabled by default in LLVM and transitively means that it's +enabled by default for Rust as well. + +WebAssembly targets for Rust now [have improved +documentation](https://github.com/rust-lang/rust/pull/128511) about WebAssembly +features and disabling them, and this post is going to review these changes and +go into depth about what's changing in LLVM. + +## Enabling Reference Types by Default + +The [reference-types proposal to +WebAssembly](https://github.com/webAssembly/reference-types) introduced a few +new concepts to WebAssembly, notably the `externref` type which is a +host-defined GC resource that WebAssembly cannot access but can pass around. +Rust does not have support for the WebAssembly `externref` type and LLVM 19 does +not change that. WebAssembly modules produced from Rust will continue to not use +the `externref` type nor have a means of being able to do so. + +Also included in the reference-types proposal, however, was the ability to have +multiple WebAssembly tables in a single module. In the original version of the +WebAssembly specification only a single table was allowed and this restriction +was relaxed with the reference-types proposal. WebAssembly tables are used by +LLVM and Rust to implement indirect function calls. For example function +pointers in WebAssembly are actually table indices and indirect function calls +are a WebAssembly `call_indirect` instruction with this table index. + +With the reference-types proposal the binary encoding of `call_indirect` +instructions was updated. Prior to the reference-types proposal `call_indirect` +was encoded with a fixed zero byte in its instruction (required to be exactly +0x00). This fixed zero byte was relaxed to a 32-bit [LEB] to indicate which +table the `call_indirect` instruction was using. For those unfamiliar [LEB] is a +way of encoding multi-byte integers in a smaller number of bytes for smaller +integers. For example the integer 0 can be encoded as `0x00` with a [LEB]. +[LEB]s are flexible to additionally allow "overlong" encodings so the integer 0 +can additionally be encoded as `0x80 0x00`. + +LLVM's support of separate compilation of source code to a WebAssembly binary +means that when an object file is emitted it does not know the final index of +the table that is going to be used in the final binary. Before reference-types +there was only one option, table 0, so `0x00` was always used when encoding +`call_indirect` instructions. After reference-types, however, LLVM will emit an +over-long [LEB] of the form `0x80 0x80 0x80 0x80 0x00` which is the maximal +length of a 32-bit [LEB]. This [LEB] is then filled in by the linker with a +relocation to the actual table index that is used by the final module. + +When putting all of this together it means that LLVM 19, which has +reference-types enabled by default, then any WebAssembly module with an indirect +function call (which is almost always the case for Rust code) will produce a +WebAssembly binary that cannot be decoded by engines and tooling that do not +support the reference-types proposal. It is expected that this change will have +a low impact due to the age of the reference-types proposal and breadth of +implementation in engines. Given the multitude of WebAssembly engines, however, +it's recommended that any WebAssembly users test out Nightly Rust and see if +the produced module still runs on the engine of choice. + +### LLVM, Rust, and Multiple Tables + +One interesting point worth mentioning is that despite reference-types enabling +multiple tables in WebAssembly modules this is not actually taken advantage of +at this time by either LLVM or Rust. WebAssembly modules emitted will still have +at most one table of functions. This means that the over-long 5-byte encoding of +index 0 as `0x80 0x80 0x80 0x80 0x00` is not actually necessary at this time. +LLD, LLVM's linker for WebAssembly, wants to process all [LEB] relocations in a +similar manner which currently forces this 5-byte encoding of zero. For example +when a function calls another function the `call` instruction encodes the target +function index as a 5-byte [LEB] which is filled in by the linker. There is +quite often more than one function so the 5-byte encoding enables all possible +function indices to be encoded. + +In the future LLVM might start using multiple tables as well. For example LLVM +may have a mode in the future where there's a table-per-function type instead of +a single heterogenous table. This can enable engines to implement +`call_indirect` more efficiently. This is not implemented at this time, however. + +For users who want a minimally-sized WebAssembly module (e.g. if you're in a web +context and sending bytes over the wire) it's recommended to use an optimization +tool such as [`wasm-opt`] to shrink the size of the output of LLVM. Even before +this change with reference-types it's recommended to do this as [`wasm-opt`] can +typically optimize LLVM's default output even further. When optimizing a module +through [`wasm-opt`] these 5-byte encodings of index 0 are all shrunk to a +single byte. + +## Enabling Multi-Value by Default + +The second feature enabled by default in LLVM 19 is multi-value. The +[multi-value proposal to WebAssembly][multi-value] enables functions to have +more than one return value for example. WebAssembly instructions are +additionally allowed to have more than one return value as well. This proposal +is one of the first to get merged into the WebAssembly specification after the +original MVP and has been implemented in many engines for quite some time. + +The consequences of enabling this feature by default in LLVM are more minor for +Rust, however, than enabling reference-types by default. LLVM's default ABI for +WebAssembly code is not changing even when multi-value is enabled. Additionally +Rust's ABI is not changing either and continues to match LLVM's. Despite this +though the change has the possibility of still affecting Nightly users of Rust. + +Rust for some time has supported an `extern "wasm"` ABI on Nightly which was an +experimental means of exposing the ability of defining a function in Rust which +returned multiple values (e.g. used the multi-value proposal). Due to +infrastructural changes and refactorings in LLVM itself this feature of Rust has +[been removed](https://github.com/rust-lang/rust/pull/127605) and is no longer +supported on Nightly at all. As a result there is no longer any possible method +of writing a function in Rust that returns multiple values at the WebAssembly +function type level. + +In summary this change is expected to not affect any Rust code in the wild +unless you were using the Nightly feature of `extern "wasm"` in which case +you'll be forced to drop support for that and use `extern "C"` instead. +Supporting WebAssembly multi-return functions in Rust is a broader topic than +this post can cover, but at this time it's an area that's ripe for contribution +from suitably motivated contributors. + +## Enabling Future Proposals to WebAssembly + +This is not the first time that a WebAssembly proposal has gone from +off-by-default to on-by-default in LLVM, nor will it be the last. For example +LLVM already enables the [sign-extension proposal][sign-ext] by default which +MVP WebAssembly did not have. It's expected that in the not-too-distant future +the +[nontrapping-fp-to-int](https://github.com/WebAssembly/nontrapping-float-to-int-conversions) +proposal will likely be enabled by default. These changes are currently not made +with strict criteria in mind (e.g. N engines must have this implemented for M +years), and there may be breakage that happens. + +If you're using a WebAssembly engine that does not support the modules emitted +by Nightly Rust and LLVM 19 then your options are: + +* Try seeing if the engine you're using has any updates available to it. You + might be using an older version which didn't support a feature but a newer + version supports the feature. +* Open an issue to raise awareness that a change is causing breakage. This could + either be done on your engine's repository, the Rust repository, or the + WebAssembly + [tool-conventions](https://github.com/WebAssembly/tool-conventions) + repository. +* Recompile your code with features disabled, more on this in the next section. + +The general assumption behind enabling new features by default is that it's a +relatively hassle-free operation for end users while bringing performance +benefits for everyone (e.g. nontrapping-fp-to-int will make float-to-int +conversions more optimal). If updates end up causing hassle it's best to flag +that early on so rollout plans can be adjusted if needed. + +## Disabling on-by-default WebAssembly proposals + +For a variety of reasons you might be motivated to disable on-by-default +WebAssembly features: for example maybe your engine is difficult to update or +doesn't support a new feature. Disabling on-by-default features is unfortunately +not the easiest task. It is notably not sufficient to use +`-Ctarget-features=-foo` to disable features for just your own project's +compilation because the Rust standard library, shipped in precompiled form, is +compiled with this features enabled. + +To disable on-by-default WebAssembly proposal it's required that you use Cargo's +[`-Zbuild-std`](https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std) +feature. For example: + +```shell +$ export RUSTFLAGS=-Ctarget-cpu=mvp +$ cargo +nightly build -Zbuild-std=panic_abort,std --target wasm32-unknown-unknown +``` + +This will recompiled the Rust standard library in addition to your own code with +the "MVP CPU" which is LLVM's placeholder for all WebAssembly proposals +disabled. This will disable sign-ext, reference-types, multi-value, etc. + +[llvm19]: https://github.com/rust-lang/rust/pull/127513 +[proposals]: https://github.com/WebAssembly/proposals +[llvmenable]: https://github.com/llvm/llvm-project/pull/80923 +[LEB]: https://en.wikipedia.org/wiki/LEB128 +[`wasm-opt`]: https://github.com/WebAssembly/binaryen +[multi-value]: https://github.com/webAssembly/multi-value +[sign-ext]: https://github.com/webAssembly/sign-extension-ops