Optimize format_args placeholders without options: Display::simple_fmt #104525

m-ou-se · 2022-11-17T10:08:16Z

This is part of #99012

format_args!("{}", "hello") pulls in the entire Display for str implementation, which includes the code necessary to support formatting options like padding (e.g. "{:50^}", etc.), which is quite unnecessary for the basic case of formatting with no options ("{}").

This adds a new method to the format trait: Display::simple_fmt. It is implemented by forwarding to the regular Display::fmt method, but is overridden in the Display impl of String and str to just call write_str directly, avoiding pulling in any code related to padding.

m-ou-se · 2022-11-17T10:13:22Z

This basic test program shows great results with this change:

#![feature(rustc_private)]
#![feature(lang_items)]
#![feature(start)]
#![no_std]

extern crate libc;

use core::fmt::{self, Write};

struct Stdout;

impl Write for Stdout {
    fn write_str(&mut self, s: &str) -> fmt::Result {
        unsafe { libc::write(1, s.as_ptr().cast(), s.len()) };
        Ok(())
    }
}

#[start]
fn main(_: isize, _: *const *const u8) -> isize {
    let s = "world";
    writeln!(Stdout, "Hello, {}!", s).is_err() as isize
}

#[lang = "eh_personality"]
extern "C" fn eh_personality() {}

#[panic_handler]
fn panic(_info: &core::panic::PanicInfo) -> ! {
    unsafe { libc::abort() };
}

Before: A .text section of 6175 bytes

After: A .text section of 2666 bytes (A 57% reduction! ✨)

Most of the difference is that the binary no longer includes core::fmt::Formatter::pad (1064 bytes) and core::str::count::do_count_chars (1663 bytes).

m-ou-se · 2022-11-17T10:15:04Z

For binaries that use string formatting with options anywhere, there won't be any benefit in binary size. There might still be a tiny tiny performance gain, although that's also unlikely.

But for small (e.g. embedded) programs, this can make a great difference, as the test program above shows.

Amanieu · 2022-11-17T12:24:40Z

Wouldn't it be simpler to have a default method on Display instead of a separate trait? This could then be extended to work with other traits such as Debug and UpperHex.

m-ou-se · 2022-11-28T19:32:56Z

Wouldn't it be simpler to have a default method on Display instead of a separate trait? This could then be extended to work with other traits such as Debug and UpperHex.

Yes, definitely. (Depending on your definition of "simpler".) Doing it as a default method on the traits doesn't require specialization and is a better solution in various ways, but does require a bit more work to implement, as it requires more changes to the format_args!() macro and ArgumentV1 type, so I didn't do that in my draft implementation.

Now that we've verified this PR can provide a big improvement in some cases, I'll update it to do it the better way. :)

m-ou-se · 2022-11-29T14:43:59Z

@bors try @rust-timer queue

bors · 2022-11-29T14:44:10Z

⌛ Trying commit 3d68c2139e8fb9a37960638feae986f02c4721a7 with merge c85ca3b2e89f3b73daa36b9976c48d69adbdd0cf...

m-ou-se · 2022-11-29T14:45:48Z

It's now implemented for all format traits (Display, Debug, Binary, Octal, and so on), but that does increase the size of the vtable for those traits. Let's wait for the test results to see if that has any significant impact.

m-ou-se · 2022-11-29T15:37:27Z

@aDotInTheVoid this PR is failing on [rustdoc-json] src/test/rustdoc-json/traits/uses_extern_trait.rs, which says "FIXME(adotinthevoid): Theses shouldn't be here":

rust/src/test/rustdoc-json/traits/uses_extern_trait.rs

Line 4 in c372b14

// FIXME(adotinthevoid): Theses shouldn't be here

What is that test for?

bors · 2022-11-29T16:53:27Z

☀️ Try build successful - checks-actions
Build commit: c85ca3b2e89f3b73daa36b9976c48d69adbdd0cf (c85ca3b2e89f3b73daa36b9976c48d69adbdd0cf)

aDotInTheVoid · 2022-11-29T18:07:41Z

TLDR: The test is fragile, and assumes Debug only has one item, #105063 fixes this, sorry.

What is that test for?

The test is checking that when rustdoc-json adds a foreign trait to the index, it also adds the methods of the trait. The fixme is because long term we don't want to add foreign traits to the index, and have the index only have local items.

The test is failing because it trys to get the fmt method from the Debug trait, and then check that their is a method called fmt with that name. But the way it gets the fmt method is by assuming it is the only one in the Debug trait only has one method. When it encounters two methods, it hits an assertion and fails.

rust-timer · 2022-11-29T23:04:50Z

Finished benchmarking commit (c85ca3b2e89f3b73daa36b9976c48d69adbdd0cf): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.0%	[0.2%, 2.4%]	58
Regressions ❌ (secondary)	1.5%	[0.3%, 2.4%]	36
Improvements ✅ (primary)	-0.3%	[-1.0%, -0.2%]	24
Improvements ✅ (secondary)	-0.8%	[-2.7%, -0.2%]	32
All ❌✅ (primary)	0.6%	[-1.0%, 2.4%]	82

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.2%	[1.6%, 3.0%]	5
Regressions ❌ (secondary)	3.1%	[1.3%, 5.1%]	10
Improvements ✅ (primary)	-3.4%	[-4.7%, -2.1%]	2
Improvements ✅ (secondary)	-1.6%	[-1.6%, -1.6%]	1
All ❌✅ (primary)	0.6%	[-4.7%, 3.0%]	7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.7%	[1.0%, 2.0%]	13
Regressions ❌ (secondary)	1.6%	[1.0%, 2.0%]	17
Improvements ✅ (primary)	-1.0%	[-1.2%, -0.8%]	2
Improvements ✅ (secondary)	-2.3%	[-2.7%, -2.1%]	6
All ❌✅ (primary)	1.3%	[-1.2%, 2.0%]	15

m-ou-se · 2022-11-30T10:18:07Z

TLDR: The test is fragile, and assumes Debug only has one item, #105063 fixes this, sorry.

Thanks!

m-ou-se · 2022-11-30T11:24:54Z

Regressions

There's quite a few ~1% regressions in compilation time. They're not that significant, but it would be nice to fix them.

My theory is that it takes LLVM a somewhat significant amount of time to basically optimize the default simple_fmt implementations to: simple_fmt = fmt, to use the same function (pointer) for both.

It would be nice if we had a feature for this in the language, so we can literally write = Self::fmt; as the default implementation, to make it an alias from the start. But we currently don't have that.

Maybe someone from the compiler team has some good ideas here. :)

Amanieu · 2022-11-30T13:07:31Z

I think this could be done as a MIR optimization when one function forwards directly to another. In that case we could just emit an LLVM symbol alias instead of emitting an inline function.

…, r=notriddle Rustdoc Json Tests: Don't assume that core::fmt::Debug will always have one item. See rust-lang#104525 (comment) and rust-lang#104525 (comment) for motivation. This still assumes that `fmt` is the first method, but thats alot less brittle than assuming it will be the only method. Sadly, we can't use a aux crate to insulate the tests from core changes, because core is special, so all we can do is try not to depend on things that may change.

nnethercote · 2022-11-30T21:13:07Z

The binary size results show lots of 1-2% regressions, which is unfortunate for this PR which is all about reducing binary sizes :(

nnethercote · 2023-04-24T22:31:44Z

Much better perf results now :) Still some sub-1% regressions in binary size. Not a showstopper, but a brief investigation would be good to see if they can be easily avoided.

m-ou-se · 2023-04-25T10:24:52Z

It's a bit unnecessary that the vtable used by &dyn Display now contains both fmt and simple_fmt. If it contains fmt anyway, there's no need for simple_fmt to be included as well. Adding where Self: Sized is the usual way of excluding something from the vtable, but that won't work in this case.

Putting it in a separate trait (so, SimpleDisplay::fmt instead of Display::simple_fmt) could avoid that, but that adds other complexity and requires specialization, making it less flexible.

m-ou-se · 2023-04-27T11:11:55Z

r? compiler

petrochenkov · 2023-04-27T17:46:22Z

Putting it in a separate trait (so, SimpleDisplay::fmt instead of Display::simple_fmt) could avoid that, but that adds other complexity and requires specialization, making it less flexible.

How much complexity we are talking about?
And what is the flexibility needed for?

The optimization slightly pessimizes the common case to make a difference in very specific cases like #104525 (comment).
If the complexity is not large then maybe it makes sense to not pessimize the common case.
@rustbot author

Kobzol · 2023-07-07T16:26:14Z

Are there any situations where simple_fmt does something else than just call write_str with a dynamically produced string (e.g. String/&str) or a 'static string (e.g. bool, enums with only fieldless variants)?

I wonder if we could instead add fn as_str(&self) -> Option<&str> to Display, similar to what Arguments already exposes.

With as_str, we could better optimize preallocation of the result string when using format! or to_string, because we would know the sizes of placeholders that implement as_str. This would not be possible with simple_fmt, I think.

The lowering would then have to match on as_str for simple placeholders, and either call write_str if it returns Some, or call the normal formatting machinery if it returns None.

bjorn3 · 2023-07-07T19:53:44Z

.as_str() isn't possible for most types. Only strings can implement it.

The lowering would then have to match on as_str for simple placeholders, and either call write_str if it returns Some, or call the normal formatting machinery if it returns None.

That will make binary bloat even worse as the if condition can't be optimized away due to the formatting machinery using dynamic dispatch everywhere.

Kobzol · 2023-07-07T20:21:58Z

.as_str() isn't possible for most types. Only strings can implement it.

Not just strings, but also e.g. bools or enums with only fieldless variants - because the returned lifetime of Option<&str> can also be 'static (AFAIK). My guess is that a similar set of types that can implement simple_fmt can also implement as_str - that's why I would be interested to see situations where simple_fmt != write_str/as_str.

But even if this was only applicable to strings, I think that it is an important case to optimize. I have seen many situations where strings are formatted, and now that's sadly suboptimal. The flattening/inlining of format_args has helped, but still it's enough to have a string in a variable (or in a const), and then it has to go through the whole fmt machinery needlessly.

That will make binary bloat even worse as the if condition can't be optimized away due to the formatting machinery using dynamic dispatch everywhere.

Yeah, optimizing the branch would be problematic with dynamic dispatch. But I guess that this could be solved with calling a function through virtual dispatch that would do the condition inside, so that the condition is only codegened once. It would mean double indirection, but that would also happen for types having the default implementation of simple_fmt.

Something like

fn simple_fmt(obj: &Display, fmt: &mut Formatter) {
   match obj.as_str() {
     Some(s) => fmt.write_str(s),
     None => obj.fmt(fmt)
   }
}

bjorn3 · 2023-07-07T20:26:50Z

But I guess that this could be solved with calling a function through virtual dispatch that would do the condition inside, so that the condition is only codegened once.

That still doesn't allow eliminating the big Display::fmt implementations entirely for embedded systems where there are no uses of non-simple formatting, right?

Kobzol · 2023-07-07T20:34:49Z

That still doesn't allow eliminating the big Display::fmt implementations entirely for embedded systems where there are no uses of non-simple formatting, right?

Probably not, and that is also not the point of the proposed as_str method, my motivation was to better optimize the runtime performance of formatting (amongst other things by better preallocating strings when doing format!/to_string).

Regarding binary size, I don't have much experience with Rust embedded, so this is just an uninformed opinion, but I think that the code size wins (although they are great!) here might not translate that well into real world programs. If having fmt in the binary or not having fmt in the binary is a fundamental difference for an embedded program, then it seems quite brittle to base this on an optimization that basically breaks once you use {:?} or format something else than a string anywhere in the program. But maybe I'm wrong and there are use-cases for only formatting strings with {} everywhere in embedded programs, I'm not sure :)

bjorn3 · 2023-07-07T22:57:16Z

amongst other things by better preallocating strings when doing format!/to_string

Maybe add a size_hint method for that?

Kobzol · 2023-07-08T06:17:22Z

Nice, that would also work, and could be even more general, e.g. integers could return their log10 number of digits.

JohnCSimon · 2023-12-17T21:29:42Z

@m-ou-se
ping from triage - can you post your status on this PR? There hasn't been an update in a few months. Thanks!

Dylan-DPC · 2024-03-12T05:52:39Z

Closing this as inactive. Feel free to reöpen this pr or create a new pr if you get the time to work on this. Thanks

m-ou-se added T-libs Relevant to the library team, which will review and decide on the PR/issue. A-fmt Area: `std::fmt` labels Nov 17, 2022

m-ou-se self-assigned this Nov 17, 2022

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 17, 2022

m-ou-se mentioned this pull request Nov 29, 2022

Tracking Issue for simple_fmt #105054

Open

6 tasks

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 29, 2022

This comment has been minimized.

Sign in to view

aDotInTheVoid mentioned this pull request Nov 29, 2022

Rustdoc Json Tests: Don't assume that core::fmt::Debug will always have one item. #105063

Merged

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Nov 29, 2022

m-ou-se changed the title ~~Optimize format_args!("{}") for str and String.~~ Optimize format_args placeholders without options: {Display, Debug, ..}::simple_fmt Nov 30, 2022

m-ou-se removed the I-compiler-nominated Nominated for discussion during a compiler team meeting. label Apr 24, 2023

m-ou-se marked this pull request as ready for review April 27, 2023 11:11

This comment was marked as off-topic.

Sign in to view

rustbot assigned petrochenkov and unassigned m-ou-se Apr 27, 2023

m-ou-se added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-experimental Status: Ongoing experiment that does not require reviewing and won't be merged in its current state. labels Apr 27, 2023

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 27, 2023

dead-claudia mentioned this pull request Oct 10, 2023

New fmt::Arguments representation. #115129

Closed

Dylan-DPC closed this Mar 12, 2024

Dylan-DPC added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Mar 12, 2024

GKFX mentioned this pull request May 3, 2024

impl fmt::Display for u32 compiles to large binary #118940

Open

m-ou-se reopened this Jul 1, 2024

m-ou-se mentioned this pull request Jul 1, 2024

Add an impl Format for PanicInfo knurling-rs/defmt#856

Merged

m-ou-se closed this Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize format_args placeholders without options: Display::simple_fmt #104525

Optimize format_args placeholders without options: Display::simple_fmt #104525

m-ou-se commented Nov 17, 2022 •

edited

Loading

m-ou-se commented Nov 17, 2022 •

edited

Loading

m-ou-se commented Nov 17, 2022

Amanieu commented Nov 17, 2022 •

edited

Loading

m-ou-se commented Nov 28, 2022

m-ou-se commented Nov 29, 2022

This comment has been minimized.

bors commented Nov 29, 2022

m-ou-se commented Nov 29, 2022

This comment has been minimized.

m-ou-se commented Nov 29, 2022

bors commented Nov 29, 2022

This comment has been minimized.

aDotInTheVoid commented Nov 29, 2022 •

edited

Loading

rust-timer commented Nov 29, 2022

m-ou-se commented Nov 30, 2022

m-ou-se commented Nov 30, 2022 •

edited

Loading

Amanieu commented Nov 30, 2022

nnethercote commented Nov 30, 2022

nnethercote commented Apr 24, 2023

m-ou-se commented Apr 25, 2023

This comment was marked as off-topic.

m-ou-se commented Apr 27, 2023

petrochenkov commented Apr 27, 2023

Kobzol commented Jul 7, 2023

bjorn3 commented Jul 7, 2023

Kobzol commented Jul 7, 2023

bjorn3 commented Jul 7, 2023

Kobzol commented Jul 7, 2023 •

edited

Loading

bjorn3 commented Jul 7, 2023

Kobzol commented Jul 8, 2023

JohnCSimon commented Dec 17, 2023

Dylan-DPC commented Mar 12, 2024

Optimize format_args placeholders without options: Display::simple_fmt #104525

Optimize format_args placeholders without options: Display::simple_fmt #104525

Conversation

m-ou-se commented Nov 17, 2022 • edited Loading

m-ou-se commented Nov 17, 2022 • edited Loading

m-ou-se commented Nov 17, 2022

Amanieu commented Nov 17, 2022 • edited Loading

m-ou-se commented Nov 28, 2022

m-ou-se commented Nov 29, 2022

This comment has been minimized.

bors commented Nov 29, 2022

m-ou-se commented Nov 29, 2022

This comment has been minimized.

m-ou-se commented Nov 29, 2022

bors commented Nov 29, 2022

This comment has been minimized.

aDotInTheVoid commented Nov 29, 2022 • edited Loading

rust-timer commented Nov 29, 2022

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Instruction count

Max RSS (memory usage)

Cycles

m-ou-se commented Nov 30, 2022

m-ou-se commented Nov 30, 2022 • edited Loading

Amanieu commented Nov 30, 2022

nnethercote commented Nov 30, 2022

nnethercote commented Apr 24, 2023

m-ou-se commented Apr 25, 2023

This comment was marked as off-topic.

m-ou-se commented Apr 27, 2023

petrochenkov commented Apr 27, 2023

Kobzol commented Jul 7, 2023

bjorn3 commented Jul 7, 2023

Kobzol commented Jul 7, 2023

bjorn3 commented Jul 7, 2023

Kobzol commented Jul 7, 2023 • edited Loading

bjorn3 commented Jul 7, 2023

Kobzol commented Jul 8, 2023

JohnCSimon commented Dec 17, 2023

Dylan-DPC commented Mar 12, 2024

m-ou-se commented Nov 17, 2022 •

edited

Loading

m-ou-se commented Nov 17, 2022 •

edited

Loading

Amanieu commented Nov 17, 2022 •

edited

Loading

aDotInTheVoid commented Nov 29, 2022 •

edited

Loading

m-ou-se commented Nov 30, 2022 •

edited

Loading

Kobzol commented Jul 7, 2023 •

edited

Loading