-
Notifications
You must be signed in to change notification settings - Fork 12.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ASCII fast-path for char::is_grapheme_extended
#121138
Conversation
I discovered that `impl Debug for str` is quite slow because it ends up doing a `unicode_data::grapheme_extend::lookup` for each char, which ends up doing a binary search. This introduces a fast-path for ASCII chars which do not have this property. The `lookup` is thus completely gone from profiles.
Add a fast-path to `Debug` ASCII `&str` Instead of going through the `EscapeDebug` machinery, we can just skip over ASCII chars that don’t need any escaping. --- This is an alternative / a companion to rust-lang#121138. The other PR is adding the fast path deep within `EscapeDebug`, whereas this skips as early as possible.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Add ASCII fast-path for `char::is_grapheme_extended` I discovered that `impl Debug for str` is quite slow because it ends up doing a `unicode_data::grapheme_extend::lookup` for each char, which ends up doing a binary search. This introduces a fast-path for ASCII chars which do not have this property. The `lookup` is thus completely gone from profiles. --- As a followup, maybe it’s worth implementing this fast path directly in `unicode_data` so that it can check for the lower bound directly before going to a potentially expensive binary search.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (2d650c6): comparison URL. Overall result: ✅ improvements - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 639.883s -> 641.472s (0.25%) |
-54.08% on the |
@@ -927,7 +927,7 @@ impl char { | |||
#[must_use] | |||
#[inline] | |||
pub(crate) fn is_grapheme_extended(self) -> bool { | |||
unicode::Grapheme_Extend(self) | |||
self > '\x7f' && unicode::Grapheme_Extend(self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason not to instead update unicode-table-generator
so this becomes a part of Grapheme_Extend
(unicode::unicode_data::grapheme_extended::lookup
)? So as to help anywhere else this call might wind up getting used
Doesn't have to be part of this PR, maybe later refactoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We really should have a const ASCII_MAX: char = '\x7f'
somewhere, maybe even public. We throw that magic number around a lot
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generator doesn't know specifics about particular properties yet, AFAICS, but I could imagine it could generically optimize lookups outside of a property's min/max.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually did this in the unicode-table-generator
(and fun fact: this was the only properly with a min > ASCII). Let me open that up as a PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here you go: #122013
@tgross35: 🔑 Insufficient privileges: Not in reviewers |
☀️ Test successful - checks-actions |
Finished benchmarking commit (bdde2a8): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesThis benchmark run did not return any relevant results for this metric. Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 642.994s -> 643.644s (0.10%) |
Add a fast-path to `Debug` ASCII `&str` Instead of going through the `EscapeDebug` machinery, we can just skip over ASCII chars that don’t need any escaping. --- This is an alternative / a companion to rust-lang#121138. The other PR is adding the fast path deep within `EscapeDebug`, whereas this skips as early as possible.
Pkgsrc changes: * Adapt checksums and patches, some have beene intregrated upstream. Upstream chnages: Version 1.78.0 (2024-05-02) =========================== Language -------- - [Stabilize `#[cfg(target_abi = ...)]`] (rust-lang/rust#119590) - [Stabilize the `#[diagnostic]` namespace and `#[diagnostic::on_unimplemented]` attribute] (rust-lang/rust#119888) - [Make async-fn-in-trait implementable with concrete signatures] (rust-lang/rust#120103) - [Make matching on NaN a hard error, and remove the rest of `illegal_floating_point_literal_pattern`] (rust-lang/rust#116284) - [static mut: allow mutable reference to arbitrary types, not just slices and arrays] (rust-lang/rust#117614) - [Extend `invalid_reference_casting` to include references casting to bigger memory layout] (rust-lang/rust#118983) - [Add `non_contiguous_range_endpoints` lint for singleton gaps after exclusive ranges] (rust-lang/rust#118879) - [Add `wasm_c_abi` lint for use of older wasm-bindgen versions] (rust-lang/rust#117918) This lint currently only works when using Cargo. - [Update `indirect_structural_match` and `pointer_structural_match` lints to match RFC] (rust-lang/rust#120423) - [Make non-`PartialEq`-typed consts as patterns a hard error] (rust-lang/rust#120805) - [Split `refining_impl_trait` lint into `_reachable`, `_internal` variants] (rust-lang/rust#121720) - [Remove unnecessary type inference when using associated types inside of higher ranked `where`-bounds] (rust-lang/rust#119849) - [Weaken eager detection of cyclic types during type inference] (rust-lang/rust#119989) - [`trait Trait: Auto {}`: allow upcasting from `dyn Trait` to `dyn Auto`] (rust-lang/rust#119338) Compiler -------- - [Made `INVALID_DOC_ATTRIBUTES` lint deny by default] (rust-lang/rust#111505) - [Increase accuracy of redundant `use` checking] (rust-lang/rust#117772) - [Suggest moving definition if non-found macro_rules! is defined later] (rust-lang/rust#121130) - [Lower transmutes from int to pointer type as gep on null] (rust-lang/rust#121282) Target changes: - [Windows tier 1 targets now require at least Windows 10] (rust-lang/rust#115141) - [Enable CMPXCHG16B, SSE3, SAHF/LAHF and 128-bit Atomics in tier 1 Windows] (rust-lang/rust#120820) - [Add `wasm32-wasip1` tier 2 (without host tools) target] (rust-lang/rust#120468) - [Add `wasm32-wasip2` tier 3 target] (rust-lang/rust#119616) - [Rename `wasm32-wasi-preview1-threads` to `wasm32-wasip1-threads`] (rust-lang/rust#122170) - [Add `arm64ec-pc-windows-msvc` tier 3 target] (rust-lang/rust#119199) - [Add `armv8r-none-eabihf` tier 3 target for the Cortex-R52] (rust-lang/rust#110482) - [Add `loongarch64-unknown-linux-musl` tier 3 target] (rust-lang/rust#121832) Refer to Rust's [platform support page][platform-support-doc] for more information on Rust's tiered platform support. Libraries --------- - [Bump Unicode to version 15.1.0, regenerate tables] (rust-lang/rust#120777) - [Make align_offset, align_to well-behaved in all cases] (rust-lang/rust#121201) - [PartialEq, PartialOrd: document expectations for transitive chains] (rust-lang/rust#115386) - [Optimize away poison guards when std is built with panic=abort] (rust-lang/rust#100603) - [Replace pthread `RwLock` with custom implementation] (rust-lang/rust#110211) - [Implement unwind safety for Condvar on all platforms] (rust-lang/rust#121768) - [Add ASCII fast-path for `char::is_grapheme_extended`] (rust-lang/rust#121138) Stabilized APIs --------------- - [`impl Read for &Stdin`] (https://doc.rust-lang.org/stable/std/io/struct.Stdin.html#impl-Read-for-%26Stdin) - [Accept non `'static` lifetimes for several `std::error::Error` related implementations] (rust-lang/rust#113833) - [Make `impl<Fd: AsFd>` impl take `?Sized`] (rust-lang/rust#114655) - [`impl From<TryReserveError> for io::Error`] (https://doc.rust-lang.org/stable/std/io/struct.Error.html#impl-From%3CTryReserveError%3E-for-Error) These APIs are now stable in const contexts: - [`Barrier::new()`] (https://doc.rust-lang.org/stable/std/sync/struct.Barrier.html#method.new) Cargo ----- - [Stabilize lockfile v4](rust-lang/cargo#12852) - [Respect `rust-version` when generating lockfile] (rust-lang/cargo#12861) - [Control `--charset` via auto-detecting config value] (rust-lang/cargo#13337) - [Support `target.<triple>.rustdocflags` officially] (rust-lang/cargo#13197) - [Stabilize global cache data tracking] (rust-lang/cargo#13492) Misc ---- - [rustdoc: add `--test-builder-wrapper` arg to support wrappers such as RUSTC_WRAPPER when building doctests] (rust-lang/rust#114651) Compatibility Notes ------------------- - [Many unsafe precondition checks now run for user code with debug assertions enabled] (rust-lang/rust#120594) This change helps users catch undefined behavior in their code, though the details of how much is checked are generally not stable. - [riscv only supports split_debuginfo=off for now] (rust-lang/rust#120518) - [Consistently check bounds on hidden types of `impl Trait`] (rust-lang/rust#121679) - [Change equality of higher ranked types to not rely on subtyping] (rust-lang/rust#118247) - [When called, additionally check bounds on normalized function return type] (rust-lang/rust#118882) - [Expand coverage for `arithmetic_overflow` lint] (rust-lang/rust#119432) Internal Changes ---------------- These changes do not affect any public interfaces of Rust, but they represent significant improvements to the performance or internals of rustc and related tools. - [Update to LLVM 18](rust-lang/rust#120055) - [Build `rustc` with 1CGU on `x86_64-pc-windows-msvc`] (rust-lang/rust#112267) - [Build `rustc` with 1CGU on `x86_64-apple-darwin`] (rust-lang/rust#112268) - [Introduce `run-make` V2 infrastructure, a `run_make_support` library and port over 2 tests as example] (rust-lang/rust#113026) - [Windows: Implement condvar, mutex and rwlock using futex] (rust-lang/rust#121956)
Add a fast-path to `Debug` ASCII `&str` Instead of going through the `EscapeDebug` machinery, we can just skip over ASCII chars that don’t need any escaping. --- This is an alternative / a companion to rust-lang#121138. The other PR is adding the fast path deep within `EscapeDebug`, whereas this skips as early as possible.
Add a fast-path to `Debug` ASCII `&str` Instead of going through the `EscapeDebug` machinery, we can just skip over ASCII chars that don’t need any escaping. --- This is an alternative / a companion to rust-lang/rust#121138. The other PR is adding the fast path deep within `EscapeDebug`, whereas this skips as early as possible.
Add a fast-path to `Debug` ASCII `&str` Instead of going through the `EscapeDebug` machinery, we can just skip over ASCII chars that don’t need any escaping. --- This is an alternative / a companion to rust-lang/rust#121138. The other PR is adding the fast path deep within `EscapeDebug`, whereas this skips as early as possible.
I discovered that
impl Debug for str
is quite slow because it ends up doing aunicode_data::grapheme_extend::lookup
for each char, which ends up doing a binary search.This introduces a fast-path for ASCII chars which do not have this property.
The
lookup
is thus completely gone from profiles.As a followup, maybe it’s worth implementing this fast path directly in
unicode_data
so that it can check for the lower bound directly before going to a potentially expensive binary search.