Skip to content

Commit

Permalink
Add a post about changes to WebAssembly targets
Browse files Browse the repository at this point in the history
This post is intended to be a summary of the changes and impact to users
after discussion in rust-lang/rust#127513,
rust-lang/rust#128511, and some surrounding
issues.
  • Loading branch information
alexcrichton committed Aug 23, 2024
1 parent ea60f92 commit 9ddd798
Showing 1 changed file with 191 additions and 0 deletions.
191 changes: 191 additions & 0 deletions posts/2024-08-26-webassembly-targets-and-new-on-by-default-features.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,191 @@
---
layout: post
title: "WebAssembly targets and new on-by-default features"
author: Alex Crichton
---

The Rust compiler has [recently upgraded to using LLVM 19][llvm19] and this
change accompanies some updates to WebAssembly targets of the Rust compiler.
Nightly Rust, what will be come Rust 1.82 on 2024-10-17, reflects all of these
changes and can be used for testing.

WebAssembly is an evolving standard where new features are being added over time
through a [proposals process][proposals]. As WebAssembly proposals reach
maturity, get merged into the specification itself, get implemented in engines,
and remains this way for quite some time then producer toolchains (e.g. LLVM)
are going to update to include these new proposals by default. In LLVM 19 this
has happened with the [multi-value and reference-types proposals][llvmenable].
These are now enabled by default in LLVM and transitively means that it's
enabled by default for Rust as well.

WebAssembly targets for Rust now [have improved
documentation](https:/rust-lang/rust/pull/128511) about WebAssembly
features and disabling them, and this post is going to review these changes and
go into depth about what's changing in LLVM.

## Enabling Reference Types by Default

The [reference-types proposal to
WebAssembly](https:/webAssembly/reference-types) introduced a few
new concepts to WebAssembly, notably the `externref` type which is a
host-defined GC resource that WebAssembly cannot access but can pass around.
Rust does not have support for the WebAssembly `externref` type and LLVM 19 does
not change that. WebAssembly modules produced from Rust will continue to not use
the `externref` type nor have a means of being able to do so.

Also included in the reference-types proposal, however, was the ability to have
multiple WebAssembly tables in a single module. In the original version of the
WebAssembly specification only a single table was allowed and this restriction
was relaxed with the reference-types proposal. WebAssembly tables are used by
LLVM and Rust to implement indirect function calls. For example function
pointers in WebAssembly are actually table indices and indirect function calls
are a WebAssembly `call_indirect` instruction with this table index.

With the reference-types proposal the binary encoding of `call_indirect`
instructions was updated. Prior to the reference-types proposal `call_indirect`
was encoded with a fixed zero byte in its instruction (required to be exactly
0x00). This fixed zero byte was relaxed to a 32-bit [LEB] to indicate which
table the `call_indirect` instruction was using. For those unfamiliar [LEB] is a
way of encoding multi-byte integers in a smaller number of bytes for smaller
integers. For example the integer 0 can be encoded as `0x00` with a [LEB].
[LEB]s are flexible to additionally allow "overlong" encodings so the integer 0
can additionally be encoded as `0x80 0x00`.

LLVM's support of separate compilation of source code to a WebAssembly binary
means that when an object file is emitted it does not know the final index of
the table that is going to be used in the final binary. Before reference-types
there was only one option, table 0, so `0x00` was always used when encoding
`call_indirect` instructions. After reference-types, however, LLVM will emit an
over-long [LEB] of the form `0x80 0x80 0x80 0x80 0x00` which is the maximal
length of a 32-bit [LEB]. This [LEB] is then filled in by the linker with a
relocation to the actual table index that is used by the final module.

When putting all of this together it means that LLVM 19, which has
reference-types enabled by default, then any WebAssembly module with an indirect
function call (which is almost always the case for Rust code) will produce a
WebAssembly binary that cannot be decoded by engines and tooling that do not
support the reference-types proposal. It is expected that this change will have
a low impact due to the age of the reference-types proposal and breadth of
implementation in engines. Given the multitude of WebAssembly engines, however,
it's recommended that any WebAssembly users test out Nightly Rust and see if
the produced module still runs on the engine of choice.

### LLVM, Rust, and Multiple Tables

One interesting point worth mentioning is that despite reference-types enabling
multiple tables in WebAssembly modules this is not actually taken advantage of
at this time by either LLVM or Rust. WebAssembly modules emitted will still have
at most one table of functions. This means that the over-long 5-byte encoding of
index 0 as `0x80 0x80 0x80 0x80 0x00` is not actually necessary at this time.
LLD, LLVM's linker for WebAssembly, wants to process all [LEB] relocations in a
similar manner which currently forces this 5-byte encoding of zero. For example
when a function calls another function the `call` instruction encodes the target
function index as a 5-byte [LEB] which is filled in by the linker. There is
quite often more than one function so the 5-byte encoding enables all possible
function indices to be encoded.

In the future LLVM might start using multiple tables as well. For example LLVM
may have a mode in the future where there's a table-per-function type instead of
a single heterogenous table. This can enable engines to implement
`call_indirect` more efficiently. This is not implemented at this time, however.

For users who want a minimally-sized WebAssembly module (e.g. if you're in a web
context and sending bytes over the wire) it's recommended to use an optimization
tool such as [`wasm-opt`] to shrink the size of the output of LLVM. Even before
this change with reference-types it's recommended to do this as [`wasm-opt`] can
typically optimize LLVM's default output even further. When optimizing a module
through [`wasm-opt`] these 5-byte encodings of index 0 are all shrunk to a
single byte.

## Enabling Multi-Value by Default

The second feature enabled by default in LLVM 19 is multi-value. The
[multi-value proposal to WebAssembly][multi-value] enables functions to have
more than one return value for example. WebAssembly instructions are
additionally allowed to have more than one return value as well. This proposal
is one of the first to get merged into the WebAssembly specification after the
original MVP and has been implemented in many engines for quite some time.

The consequences of enabling this feature by default in LLVM are more minor for
Rust, however, than enabling reference-types by default. LLVM's default ABI for
WebAssembly code is not changing even when multi-value is enabled. Additionally
Rust's ABI is not changing either and continues to match LLVM's. Despite this
though the change has the possibility of still affecting Nightly users of Rust.

Rust for some time has supported an `extern "wasm"` ABI on Nightly which was an
experimental means of exposing the ability of defining a function in Rust which
returned multiple values (e.g. used the multi-value proposal). Due to
infrastructural changes and refactorings in LLVM itself this feature of Rust has
[been removed](https:/rust-lang/rust/pull/127605) and is no longer
supported on Nightly at all. As a result there is no longer any possible method
of writing a function in Rust that returns multiple values at the WebAssembly
function type level.

In summary this change is expected to not affect any Rust code in the wild
unless you were using the Nightly feature of `extern "wasm"` in which case
you'll be forced to drop support for that and use `extern "C"` instead.
Supporting WebAssembly multi-return functions in Rust is a broader topic than
this post can cover, but at this time it's an area that's ripe for contribution
from suitably motivated contributors.

## Enabling Future Proposals to WebAssembly

This is not the first time that a WebAssembly proposal has gone from
off-by-default to on-by-default in LLVM, nor will it be the last. For example
LLVM already enables the [sign-extension proposal][sign-ext] by default which
MVP WebAssembly did not have. It's expected that in the not-too-distant future
the
[nontrapping-fp-to-int](https:/WebAssembly/nontrapping-float-to-int-conversions)
proposal will likely be enabled by default. These changes are currently not made
with strict criteria in mind (e.g. N engines must have this implemented for M
years), and there may be breakage that happens.

If you're using a WebAssembly engine that does not support the modules emitted
by Nightly Rust and LLVM 19 then your options are:

* Try seeing if the engine you're using has any updates available to it. You
might be using an older version which didn't support a feature but a newer
version supports the feature.
* Open an issue to raise awareness that a change is causing breakage. This could
either be done on your engine's repository, the Rust repository, or the
WebAssembly
[tool-conventions](https:/WebAssembly/tool-conventions)
repository.
* Recompile your code with features disabled, more on this in the next section.

The general assumption behind enabling new features by default is that it's a
relatively hassle-free operation for end users while bringing performance
benefits for everyone (e.g. nontrapping-fp-to-int will make float-to-int
conversions more optimal). If updates end up causing hassle it's best to flag
that early on so rollout plans can be adjusted if needed.

## Disabling on-by-default WebAssembly proposals

For a variety of reasons you might be motivated to disable on-by-default
WebAssembly features: for example maybe your engine is difficult to update or
doesn't support a new feature. Disabling on-by-default features is unfortunately
not the easiest task. It is notably not sufficient to use
`-Ctarget-features=-foo` to disable features for just your own project's
compilation because the Rust standard library, shipped in precompiled form, is
compiled with this features enabled.

To disable on-by-default WebAssembly proposal it's required that you use Cargo's
[`-Zbuild-std`](https://doc.rust-lang.org/nightly/cargo/reference/unstable.html#build-std)
feature. For example:

```shell
$ export RUSTFLAGS=-Ctarget-cpu=mvp
$ cargo +nightly build -Zbuild-std=panic_abort,std --target wasm32-unknown-unknown
```

This will recompiled the Rust standard library in addition to your own code with
the "MVP CPU" which is LLVM's placeholder for all WebAssembly proposals
disabled. This will disable sign-ext, reference-types, multi-value, etc.

[llvm19]: https:/rust-lang/rust/pull/127513
[proposals]: https:/WebAssembly/proposals
[llvmenable]: https:/llvm/llvm-project/pull/80923
[LEB]: https://en.wikipedia.org/wiki/LEB128
[`wasm-opt`]: https:/WebAssembly/binaryen
[multi-value]: https:/webAssembly/multi-value
[sign-ext]: https:/webAssembly/sign-extension-ops

0 comments on commit 9ddd798

Please sign in to comment.