New fmt::Arguments representation. #115129

m-ou-se · 2023-08-23T11:59:25Z

This changes the internal representation of fmt::Arguments to reduce it in size and make it possible to convert a &'static str to a fmt::Arguments without additional indirection.

Part of #99012

fmt::Arguments stores three slices:

The slice of string literals between the placeholders.
The details of the placeholders (format options, which argument to display there, etc.).
The arguments.

2 is omitted when it's just all arguments once in order without additional options.

Before this change, fmt::Arguments stores each of these as a &[] (or Option of that), resulting in a total size of six words/pointers.

However, we don't need to store the length of each of these slices separately, as we may (unsafely) assume that the indexes of the fmt::Arguments generated by format_args!() are never out of bounds, and the number of placeholders and number of strings between them must be nearly equal.

This PR changes the struct to store three pointers without length, and a single 'number of parts' counter that is the sum of the number of placeholders and string pieces.

Additionally, this PR adds a special case for 1 part (that is, one string piece and no placeholders) to store the string pointer directly instead of through a slice of length 1.

This makes it possible for fmt::Arguments::new_str(&'static str) to exist, which unlike before (new_const(&[&'static str])) doesn't need to borrow anything other than the string itself.

library/core/src/fmt/mod.rs

tests/incremental/hashes/inherent_impls.rs

tests/mir-opt/inline/inline_into_box_place.main.Inline.panic-abort.diff

m-ou-se · 2023-08-23T14:38:25Z

@bors try @rust-timer queue

bors · 2023-08-23T14:38:35Z

⌛ Trying commit 6091d4dbd9e0cae96c550bd08ed860d3c76df120 with merge 0650fbf1bf480ad163fba30602ccdf6013aacd39...

bors · 2023-08-23T15:51:50Z

☀️ Try build successful - checks-actions
Build commit: 0650fbf1bf480ad163fba30602ccdf6013aacd39 (0650fbf1bf480ad163fba30602ccdf6013aacd39)

tests/mir-opt/inline/inline_into_box_place.main.Inline.panic-abort.diff

tests/mir-opt/pre-codegen/slice_iter.enumerated_loop.PreCodegen.after.panic-abort.mir

tests/mir-opt/pre-codegen/slice_iter.forward_loop.PreCodegen.after.panic-unwind.mir

rust-timer · 2023-08-24T00:26:23Z

Finished benchmarking commit (0650fbf1bf480ad163fba30602ccdf6013aacd39): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.0%	[0.3%, 4.5%]	10
Regressions ❌ (secondary)	0.3%	[0.2%, 0.6%]	4
Improvements ✅ (primary)	-0.5%	[-1.6%, -0.2%]	40
Improvements ✅ (secondary)	-0.7%	[-1.0%, -0.2%]	19
All ❌✅ (primary)	-0.2%	[-1.6%, 4.5%]	50

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	4.9%	[0.2%, 11.8%]	7
Regressions ❌ (secondary)	7.4%	[2.7%, 13.2%]	6
Improvements ✅ (primary)	-3.6%	[-5.2%, -0.5%]	6
Improvements ✅ (secondary)	-2.8%	[-3.8%, -1.8%]	2
All ❌✅ (primary)	1.0%	[-5.2%, 11.8%]	13

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.2%	[1.4%, 3.3%]	3
Regressions ❌ (secondary)	1.0%	[1.0%, 1.0%]	1
Improvements ✅ (primary)	-2.5%	[-5.0%, -0.8%]	27
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-2.0%	[-5.0%, 3.3%]	30

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.1%, 1.2%]	67
Regressions ❌ (secondary)	0.7%	[0.2%, 1.3%]	11
Improvements ✅ (primary)	-0.2%	[-1.4%, -0.0%]	27
Improvements ✅ (secondary)	-0.4%	[-1.3%, -0.1%]	8
All ❌✅ (primary)	0.3%	[-1.4%, 1.2%]	94

Bootstrap: 635.819s -> 633.195s (-0.41%)
Artifact size: 347.15 MiB -> 346.99 MiB (-0.05%)

m-ou-se · 2023-08-24T07:15:47Z

That looks pretty good, but I'm a bit surprised to see a few things like helloworld incr-unchanged go up (by 0.55%).

m-ou-se · 2023-08-24T07:28:28Z

Let's see what happens if I add a few #[inline(always)]...

@bors try @rust-timer queue

bors · 2023-08-24T07:28:47Z

⌛ Trying commit d1922d5ad206618355b837e688e857dd38e5dffc with merge 0208b67f46c3dc86f3df10c3e01fb70047717ff6...

bors · 2023-08-30T07:48:51Z

☀️ Try build successful - checks-actions
Build commit: 3373eb1ec6feda72af1b09f19aba78a9e3a88f39 (3373eb1ec6feda72af1b09f19aba78a9e3a88f39)

rust-timer · 2023-08-30T14:30:29Z

Finished benchmarking commit (3373eb1ec6feda72af1b09f19aba78a9e3a88f39): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.8%	[0.3%, 4.6%]	14
Regressions ❌ (secondary)	0.5%	[0.2%, 0.8%]	6
Improvements ✅ (primary)	-0.7%	[-1.1%, -0.3%]	16
Improvements ✅ (secondary)	-0.3%	[-0.3%, -0.3%]	7
All ❌✅ (primary)	-0.0%	[-1.1%, 4.6%]	30

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	2.9%	[0.1%, 7.6%]	8
Regressions ❌ (secondary)	1.7%	[0.8%, 2.4%]	5
Improvements ✅ (primary)	-2.9%	[-7.3%, -0.7%]	10
Improvements ✅ (secondary)	-3.7%	[-6.2%, -1.7%]	5
All ❌✅ (primary)	-0.4%	[-7.3%, 7.6%]	18

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	4.1%	[4.1%, 4.1%]	1
Regressions ❌ (secondary)	2.6%	[2.2%, 2.9%]	2
Improvements ✅ (primary)	-2.6%	[-4.9%, -1.2%]	6
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.7%	[-4.9%, 4.1%]	7

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.0%, 1.1%]	60
Regressions ❌ (secondary)	0.3%	[0.1%, 0.7%]	10
Improvements ✅ (primary)	-0.2%	[-0.6%, -0.0%]	28
Improvements ✅ (secondary)	-0.3%	[-0.4%, -0.1%]	7
All ❌✅ (primary)	0.1%	[-0.6%, 1.1%]	88

Bootstrap: 630.718s -> 629.174s (-0.24%)
Artifact size: 316.56 MiB -> 316.53 MiB (-0.01%)

m-ou-se · 2023-08-30T14:43:26Z

Well that's quite a mixed bag of perf results. (: Also a few runtime benchmarks degraded quite a bit. If anyone feels like helping out: I could use some help figuring out what is causing these results. ^^

Kobzol · 2023-09-14T07:54:10Z

GitHub is now somehow super eager to delete old try commits. I'll rerun perf so that I can get a baseline commit SHA to compare against.

@bory try @rust-timer queue

Kobzol · 2023-09-15T14:39:25Z

@bors try

bors · 2023-09-15T14:39:37Z

⌛ Trying commit 453eba0 with merge cf92f86...

New fmt::Arguments representation. This changes the internal representation of `fmt::Arguments` to reduce it in size and make it possible to convert a `&'static str` to a `fmt::Arguments` without additional indirection. Part of rust-lang#99012 fmt::Arguments stores three slices: 1. The slice of string literals between the placeholders. 2. The details of the placeholders (format options, which argument to display there, etc.). 3. The arguments. 2 is omitted when it's just all arguments once in order without additional options. Before this change, fmt::Arguments stores each of these as a &[] (or Option of that), resulting in a total size of six words/pointers. However, we don't need to store the length of each of these slices separately, as we may (unsafely) assume that the indexes of the fmt::Arguments generated by format_args!() are never out of bounds, and the number of placeholders and number of strings between them must be nearly equal. This PR changes the struct to store three pointers without length, and a single 'number of parts' counter that is the sum of the number of placeholders and string pieces. Additionally, this PR adds a special case for 1 part (that is, one string piece and no placeholders) to store the string pointer directly instead of through a slice of length 1. This makes it possible for `fmt::Arguments::new_str(&'static str)` to exist, which unlike before (`new_const(&[&'static str])`) doesn't need to borrow anything other than the string itself.

bors · 2023-09-15T15:50:52Z

☀️ Try build successful - checks-actions
Build commit: cf92f86 (cf92f86bcc15ebf3a31a9467a890c989b7665512)

rust-timer · 2023-09-15T18:19:49Z

Finished benchmarking commit (cf92f86): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	1.8%	[0.3%, 4.5%]	3
Regressions ❌ (secondary)	0.5%	[0.5%, 0.5%]	1
Improvements ✅ (primary)	-0.8%	[-1.6%, -0.4%]	24
Improvements ✅ (secondary)	-1.1%	[-1.6%, -0.6%]	11
All ❌✅ (primary)	-0.5%	[-1.6%, 4.5%]	27

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.9%	[0.2%, 5.8%]	5
Regressions ❌ (secondary)	2.1%	[2.1%, 2.1%]	1
Improvements ✅ (primary)	-2.6%	[-5.6%, -0.6%]	6
Improvements ✅ (secondary)	-3.1%	[-4.7%, -2.1%]	4
All ❌✅ (primary)	0.4%	[-5.6%, 5.8%]	11

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.6%	[3.6%, 3.6%]	1
Regressions ❌ (secondary)	3.1%	[2.9%, 3.5%]	4
Improvements ✅ (primary)	-2.3%	[-4.9%, -1.1%]	8
Improvements ✅ (secondary)	-1.3%	[-1.3%, -1.3%]	1
All ❌✅ (primary)	-1.7%	[-4.9%, 3.6%]	9

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.0%, 1.3%]	67
Regressions ❌ (secondary)	0.4%	[0.2%, 0.7%]	10
Improvements ✅ (primary)	-0.2%	[-0.8%, -0.0%]	26
Improvements ✅ (secondary)	-0.3%	[-1.3%, -0.1%]	16
All ❌✅ (primary)	0.3%	[-0.8%, 1.3%]	93

Bootstrap: 633.484s -> 629.614s (-0.61%)
Artifact size: 318.18 MiB -> 317.59 MiB (-0.19%)

Kobzol · 2023-09-17T10:43:45Z

I analyzed the fmt runtime benchmark using the following commands:

# baseline commit
$ rustup-toolchain-install-master e7f9f48d767c1e17e125d83e3102d568f87b868d
# PR commit
$ rustup-toolchain-install-master cf92f86bcc15ebf3a31a9467a890c989b7665512

# analyse
$ RUST_LOG=debug cargo run --bin collector codegen_diff [asm/llvm/mir] fmt `rustup +e7f9f48d767c1e17e125d83e3102d568f87b868d which rustc` `rustup +cf92f86bcc15ebf3a31a9467a890c989b7665512 which rustc` > diff.txt

I haven't really found any interesting differences between ASM and LLVM. The only thing I noticed is that MIR got bigger for one function, probably because of different MIR inlining (?).

MIR diff

Function size stats:
fn main::{closure#0}(_1: &mut [closure@src/main.rs:17:14: 17:17], _2: u64) -> Person: 5848 -> 9140 (+56.29%)

Diff for fn main::{closure#0}(_1: &mut [closure@src/main.rs:17:14: 17:17], _2: u64) -> Person
-----------------------------
 fn main::{closure#0}(_1: &mut [closure@src/main.rs:17:14: 17:17], _2: u64) -> Person {
     debug i => _2;
     let mut _0: Person;
     let mut _3: std::string::String;
     let _4: std::string::String;
     let mut _5: std::fmt::Arguments<'_>;
     let mut _6: &[&str];
     let mut _7: &[core::fmt::rt::Argument<'_>];
     let mut _8: &[core::fmt::rt::Argument<'_>; 1];
     let _9: [core::fmt::rt::Argument<'_>; 1];
     let mut _10: core::fmt::rt::Argument<'_>;
     let mut _11: &u64;
     let mut _12: std::string::String;
     let _13: std::string::String;
     let mut _14: std::fmt::Arguments<'_>;
     let mut _15: &[&str];
     let mut _16: &[core::fmt::rt::Argument<'_>];
     let mut _17: &[core::fmt::rt::Argument<'_>; 1];
     let _18: [core::fmt::rt::Argument<'_>; 1];
     let mut _19: core::fmt::rt::Argument<'_>;
     let mut _20: &u64;
     let mut _21: u8;
     let mut _22: u64;
     let mut _23: u64;
     let mut _24: &[&str; 1];
     let mut _25: &[&str; 1];
     scope 1 {
         debug res => _4;
     }
     scope 2 {
         debug res => _13;
     }
     scope 3 (inlined core::fmt::rt::Argument::<'_>::new_display::<u64>) {
         debug x => _11;
         let mut _26: for<'a, 'b, 'c> fn(&'a u64, &'b mut std::fmt::Formatter<'c>) -> std::result::Result<(), std::fmt::Error>;
         scope 4 (inlined core::fmt::rt::Argument::<'_>::new::<u64>) {
             debug x => _11;
             debug f => _26;
             let mut _27: for<'a, 'b, 'c> fn(&'a core::fmt::rt::Opaque, &'b mut std::fmt::Formatter<'c>) -> std::result::Result<(), std::fmt::Error>;
             let mut _28: &core::fmt::rt::Opaque;
             scope 5 {
             }
         }
     }
-    scope 6 (inlined core::fmt::rt::Argument::<'_>::new_display::<u64>) {
+    scope 6 (inlined format) {
+        debug args => _5;
+        let mut _29: std::option::Option<&str>;
+        let mut _30: &std::fmt::Arguments<'_>;
+        let mut _31: [closure@std::fmt::format::{closure#0}];
+        let mut _32: &std::fmt::Arguments<'_>;
+        scope 7 (inlined Arguments::<'_>::as_str) {
+            debug self => _30;
+            let mut _33: usize;
+            let mut _34: std::num::NonZeroUsize;
+            let mut _35: &str;
+            scope 8 {
+            }
+            scope 9 (inlined NonZeroUsize::get) {
+                debug self => _34;
+            }
+        }
+    }
+    scope 10 (inlined core::fmt::rt::Argument::<'_>::new_display::<u64>) {
         debug x => _20;
-        let mut _29: for<'a, 'b, 'c> fn(&'a u64, &'b mut std::fmt::Formatter<'c>) -> std::result::Result<(), std::fmt::Error>;
-        scope 7 (inlined core::fmt::rt::Argument::<'_>::new::<u64>) {
+        let mut _36: for<'a, 'b, 'c> fn(&'a u64, &'b mut std::fmt::Formatter<'c>) -> std::result::Result<(), std::fmt::Error>;
+        scope 11 (inlined core::fmt::rt::Argument::<'_>::new::<u64>) {
             debug x => _20;
-            debug f => _29;
-            let mut _30: for<'a, 'b, 'c> fn(&'a core::fmt::rt::Opaque, &'b mut std::fmt::Formatter<'c>) -> std::result::Result<(), std::fmt::Error>;
-            let mut _31: &core::fmt::rt::Opaque;
-            scope 8 {
+            debug f => _36;
+            let mut _37: for<'a, 'b, 'c> fn(&'a core::fmt::rt::Opaque, &'b mut std::fmt::Formatter<'c>) -> std::result::Result<(), std::fmt::Error>;
+            let mut _38: &core::fmt::rt::Opaque;
+            scope 12 {
+            }
+        }
+    }
+    scope 13 (inlined format) {
+        debug args => _14;
+        let mut _39: std::option::Option<&str>;
+        let mut _40: &std::fmt::Arguments<'_>;
+        let mut _41: [closure@std::fmt::format::{closure#0}];
+        let mut _42: &std::fmt::Arguments<'_>;
+        scope 14 (inlined Arguments::<'_>::as_str) {
+            debug self => _40;
+            let mut _43: usize;
+            let mut _44: std::num::NonZeroUsize;
+            let mut _45: &str;
+            scope 15 {
+            }
+            scope 16 (inlined NonZeroUsize::get) {
+                debug self => _44;
             }
         }
     }
 
     bb0: {
         StorageLive(_3);
         StorageLive(_4);
         StorageLive(_5);
         StorageLive(_6);
         _25 = const _;
         _6 = _25 as &[&str] (PointerCoercion(Unsize));
         StorageLive(_7);
         StorageLive(_8);
         StorageLive(_9);
         StorageLive(_10);
         StorageLive(_11);
         _11 = &_2;
         StorageLive(_26);
         _26 = <u64 as std::fmt::Display>::fmt as for<'a, 'b, 'c> fn(&'a u64, &'b mut std::fmt::Formatter<'c>) -> std::result::Result<(), std::fmt::Error> (PointerCoercion(ReifyFnPointer));
         StorageLive(_27);
         _27 = _26 as for<'a, 'b, 'c> fn(&'a core::fmt::rt::Opaque, &'b mut std::fmt::Formatter<'c>) -> std::result::Result<(), std::fmt::Error> (Transmute);
         StorageLive(_28);
         _28 = _11 as &core::fmt::rt::Opaque (Transmute);
         _10 = core::fmt::rt::Argument::<'_> { value: move _28, formatter: move _27 };
         StorageDead(_28);
         StorageDead(_27);
         StorageDead(_26);
         StorageDead(_11);
         _9 = [move _10];
         StorageDead(_10);
         _8 = &_9;
         _7 = move _8 as &[core::fmt::rt::Argument<'_>] (PointerCoercion(Unsize));
         StorageDead(_8);
         _5 = Arguments::<'_>::new_v1(move _6, move _7) -> [return: bb1, unwind continue];
     }
 
     bb1: {
         StorageDead(_7);
         StorageDead(_6);
-        _4 = format(move _5) -> [return: bb2, unwind continue];
+        StorageLive(_29);
+        StorageLive(_30);
+        StorageLive(_33);
+        StorageLive(_34);
+        _34 = (_5.0: std::num::NonZeroUsize);
+        _33 = (_34.0: usize);
+        StorageDead(_34);
+        switchInt(move _33) -> [1: bb7, otherwise: bb8];
     }
 
     bb2: {
+        StorageDead(_16);
+        StorageDead(_15);
+        StorageLive(_39);
+        StorageLive(_40);
+        StorageLive(_43);
+        StorageLive(_44);
+        _44 = (_14.0: std::num::NonZeroUsize);
+        _43 = (_44.0: usize);
+        StorageDead(_44);
+        switchInt(move _43) -> [1: bb11, otherwise: bb12];
+    }
+
+    bb3 (cleanup): {
+        drop(_3) -> [return: bb4, unwind terminate(cleanup)];
+    }
+
+    bb4 (cleanup): {
+        resume;
+    }
+
+    bb5: {
+        StorageDead(_30);
+        StorageLive(_31);
+        StorageLive(_32);
+        _32 = &_5;
+        _31 = [closure@/rustc/cf92f86bcc15ebf3a31a9467a890c989b7665512/library/alloc/src/fmt.rs:614:31: 614:33] { 0: move _32 };
+        StorageDead(_32);
+        _4 = Option::<&str>::map_or_else::<String, [closure@format::{closure#0}], for<'a> fn(&'a str) -> <str as ToOwned>::Owned {<str as ToOwned>::to_owned}>(move _29, move _31, <str as ToOwned>::to_owned) -> [return: bb6, unwind continue];
+    }
+
+    bb6: {
+        StorageDead(_31);
+        StorageDead(_29);
         StorageDead(_5);
         StorageDead(_9);
         _3 = move _4;
         StorageDead(_4);
         StorageLive(_12);
         StorageLive(_13);
         StorageLive(_14);
         StorageLive(_15);
         _24 = const _;
         _15 = _24 as &[&str] (PointerCoercion(Unsize));
         StorageLive(_16);
         StorageLive(_17);
         StorageLive(_18);
         StorageLive(_19);
         StorageLive(_20);
         _20 = &_2;
-        StorageLive(_29);
-        _29 = <u64 as std::fmt::Display>::fmt as for<'a, 'b, 'c> fn(&'a u64, &'b mut std::fmt::Formatter<'c>) -> std::result::Result<(), std::fmt::Error> (PointerCoercion(ReifyFnPointer));
-        StorageLive(_30);
-        _30 = _29 as for<'a, 'b, 'c> fn(&'a core::fmt::rt::Opaque, &'b mut std::fmt::Formatter<'c>) -> std::result::Result<(), std::fmt::Error> (Transmute);
-        StorageLive(_31);
-        _31 = _20 as &core::fmt::rt::Opaque (Transmute);
-        _19 = core::fmt::rt::Argument::<'_> { value: move _31, formatter: move _30 };
-        StorageDead(_31);
-        StorageDead(_30);
-        StorageDead(_29);
+        StorageLive(_36);
+        _36 = <u64 as std::fmt::Display>::fmt as for<'a, 'b, 'c> fn(&'a u64, &'b mut std::fmt::Formatter<'c>) -> std::result::Result<(), std::fmt::Error> (PointerCoercion(ReifyFnPointer));
+        StorageLive(_37);
+        _37 = _36 as for<'a, 'b, 'c> fn(&'a core::fmt::rt::Opaque, &'b mut std::fmt::Formatter<'c>) -> std::result::Result<(), std::fmt::Error> (Transmute);
+        StorageLive(_38);
+        _38 = _20 as &core::fmt::rt::Opaque (Transmute);
+        _19 = core::fmt::rt::Argument::<'_> { value: move _38, formatter: move _37 };
+        StorageDead(_38);
+        StorageDead(_37);
+        StorageDead(_36);
         StorageDead(_20);
         _18 = [move _19];
         StorageDead(_19);
         _17 = &_18;
         _16 = move _17 as &[core::fmt::rt::Argument<'_>] (PointerCoercion(Unsize));
         StorageDead(_17);
-        _14 = Arguments::<'_>::new_v1(move _15, move _16) -> [return: bb3, unwind: bb5];
+        _14 = Arguments::<'_>::new_v1(move _15, move _16) -> [return: bb2, unwind: bb3];
+    }
+
+    bb7: {
+        StorageDead(_33);
+        _35 = deref_copy ((_5.1: core::fmt::Parts).0: &str);
+        _29 = Option::<&str>::Some(_35);
+        goto -> bb5;
     }
 
-    bb3: {
-        StorageDead(_16);
-        StorageDead(_15);
-        _13 = format(move _14) -> [return: bb4, unwind: bb5];
+    bb8: {
+        StorageDead(_33);
+        _29 = Option::<&str>::None;
+        goto -> bb5;
     }
 
-    bb4: {
+    bb9: {
+        StorageDead(_40);
+        StorageLive(_41);
+        StorageLive(_42);
+        _42 = &_14;
+        _41 = [closure@/rustc/cf92f86bcc15ebf3a31a9467a890c989b7665512/library/alloc/src/fmt.rs:614:31: 614:33] { 0: move _42 };
+        StorageDead(_42);
+        _13 = Option::<&str>::map_or_else::<String, [closure@format::{closure#0}], for<'a> fn(&'a str) -> <str as ToOwned>::Owned {<str as ToOwned>::to_owned}>(move _39, move _41, <str as ToOwned>::to_owned) -> [return: bb10, unwind: bb3];
+    }
+
+    bb10: {
+        StorageDead(_41);
+        StorageDead(_39);
         StorageDead(_14);
         StorageDead(_18);
         _12 = move _13;
         StorageDead(_13);
         StorageLive(_21);
         StorageLive(_22);
         StorageLive(_23);
         _23 = _2;
         _22 = Rem(move _23, const 100_u64);
         StorageDead(_23);
         _21 = move _22 as u8 (IntToInt);
         StorageDead(_22);
         _0 = Person { first_name: move _3, last_name: move _12, age: move _21 };
         StorageDead(_21);
         StorageDead(_12);
         StorageDead(_3);
         return;
     }
 
-    bb5 (cleanup): {
-        drop(_3) -> [return: bb6, unwind terminate(cleanup)];
+    bb11: {
+        StorageDead(_43);
+        _45 = deref_copy ((_14.1: core::fmt::Parts).0: &str);
+        _39 = Option::<&str>::Some(_45);
+        goto -> bb9;
     }
 
-    bb6 (cleanup): {
-        resume;
+    bb12: {
+        StorageDead(_43);
+        _39 = Option::<&str>::None;
+        goto -> bb9;
     }
 }
-----------------------------

cjgillot · 2023-09-21T23:44:39Z

library/core/src/fmt/mod.rs

+ ///
+ /// The placaeholders pointer can be null for default placeholders:
+ /// a placeholder for each argument once, in order, with default formatting options.
+ strings_and_placeholders: (*const &'static str, *const rt::Placeholder),
 }


Why not use an enum? A variant for the num_parts == 1 case, and the other for the general case.
That's how the type is used AFAIU.

cjgillot · 2023-09-21T23:45:25Z

library/core/src/fmt/mod.rs

- pub fn new_v1(pieces: &'a [&'static str], args: &'a [rt::Argument<'a>]) -> Arguments<'a> {
- if pieces.len() < args.len() || pieces.len() > args.len() + 1 {
- panic!("invalid args");
+ pub fn new_v1(strings: &'a [&'static str], args: &'a [rt::Argument<'a>]) -> Self {


From the posted MIR snippet, would it be worth optimizing new_v1 to be easier to optimize? For instance by checking more invariants in rustc?

by checking more invariants in rustc

@cjgillot What do you mean?

dead-claudia · 2023-10-10T01:13:38Z

Edit: bring up an optimization opportunity.

@m-ou-se

Have you considered doing a hybrid of structured arrays and bytecode, like this? You can consider this as an alternative to this PR as it's mutually exclusive with some of the proposed functions here.

Bytecode string is byte-aligned
Count type:
- 0 = implied/not specified
- 1 = immediate, expect value
- 2 = parameter, expect index
Part segment:
- Metadata byte
  - Bit 0: next length is 32-bit
  - Bit 1: has non-default fill char and/or flags
  - Bit 2-3: precision count type
  - Bit 4-5: width count type
  - Bit 6-7: alignment
- Next length u8/u32
- 32-bit field for fill char (Unicode values are 21-bit) + flags, if non-default
  - Bit 21: SignPlus
  - Bit 22: SignMinus
  - Bit 23: Alternate
  - Bit 24: SignAwareZeroPad
  - Bit 25: DebugLowerHex
  - Bit 26: DebugUpperHex
  - Bit 30: precision is 32-bit
  - Bit 31: width is 32-bit
- Precision u8/u32, if precision count type is not "implied"
- Width u8/u32, if width count type is not "implied"
- Next data bytes
Arguments static data:
- Static metadata byte:
  - Bit 0: part length is 32-bit
  - Bit 1: start length is 32-bit
- Part count u8/u32 (before start count to simplify certain tests)
- Start length u8/u32
- Start data bytes
- Part segment sequence
Arguments consists of two pointers: bytecode template and arguments start (may be null if no arguments needed). Arguments end is implicitly read from the template.

Multiple things are being suggested here:

The static array is a thin pointer to a dynamically sized byte-aligned static array that holds everything. This would be iterated as a byte sequence, using unaligned reads for everything.
Most flags are stuffed into a couple fields, to save a bunch of space. Notably, the fill character's spare space (Unicode code points only take up 21 bits) is also used for flags.
Default-option placeholders are optimized to just use a simple 0, which while it's slightly larger than complete omission, it's not by much. This avoids the catastrophic placeholder growth issue and makes the bytecode size adaptive.
Arguments would exist in actual use sequence and iterated in one pass, without special indexing. This greatly simplifies iteration (it's just a pointer add) while only bringing slight overhead for multiply-used arguments (that would need saved to multiple stack slots instead of just one).
Lengths are variably encoded with a very simple prefix bit (who here is creating 2^32-byte segments?), cutting size down a lot for the most common case of very small segments.
Lengths are stored inline in the template, relying on cache locality to stay relatively fast.

Emit's considerably more complicated, but it not only trims the runtime size to 2 pointers + arguments stack array for arguments, but also shaves the runtime array sizes so much, it's about the size of a C format string (but much simpler to process).

Here's how it compares with purely static strings:

"a{}b{}c" would require 10 bytes static on all platforms.
- Currently, it's 24 bytes on 32-bit platforms, 48 bytes on 64-bit platforms, assuming pieces and args are all stored statically.
- The C format string equivalent "a%sb%sc" requires 8 bytes.
"a{}b{:.5}c" would require 11 bytes static on all platforms.
- Currently, it's 88 bytes on 32-bit platforms, 160 bytes on 64-bit platforms, assuming pieces and args are all stored statically.
- The C format string equivalent "a%sb%.5fc" requires 10 bytes.
For both, 16 bytes stack would be used by 32-bit platforms and 32 bytes stack is used by 64-byte platforms for the arguments struct.
- This is the same as it currently is, ignoring argument passing overhead.
- C variable arguments requires similar stack overhead, but indirectly. They can't truly escape it.
For both, there would only be two pointers total: placeholder and arguments. No other indirection would exist, and no extra length argument needed (as that's encoded in the source string).

This would help in three areas:

Code generation for the function call: it's just write args to stack, load template pointer, call format function. Very simple.
The fmt::write function implementation itself is so simple, it could be feasibly written in assembly.
The format string is very compact, reducing binary size quite a bit.

As for why I'm not also inlining function references, those generally lack stable runtime positions anyways, and so they can't be fully static. One could lazily resolve it, but it's not worth it for such a small number of local functions.

Simple Display/etc. delegation (ex: #104525) could be done by just making the template optional and, if missing, invoking the method with the object directly. This handles that case completely generically. And for simple non-option formatting, the "simple" variant could be used exclusively. You could also pass a null pointer for the <String as Display>::fmt function instead to bypass it and avoid needing to call a method at all.

And yes, this function could, in theory, be made polymorphic over UTF-8 strings and byte strings with few changes. Only real difference is character filling.

Data can be made all 32-bit and unconditionally present if the above variable-length encoding isn't okay.

Part segment:
- Metadata byte
  - Bit 2-3: precision count type
  - Bit 4-5: width count type
  - Bit 6-7: alignment
- Next length u32
- 32-bit field for fill char (Unicode values are 21-bit) + flags
  - Bit 21: SignPlus
  - Bit 22: SignMinus
  - Bit 23: Alternate
  - Bit 24: SignAwareZeroPad
  - Bit 25: DebugLowerHex
  - Bit 26: DebugUpperHex
- Precision u32
- Width u32
- Next data bytes
Arguments static data:
- Part count u32
- Start length u32
- Start data bytes
- Part segment sequence

This increases static size overhead to the following:

"a{}b{}c" would require 45 bytes static on all platforms, compared to 24 on 32-bit and 48 on 64-bit currently.
"a{}b{:.5}c" would require 45 bytes static on all platforms, compared to 88 on 32-bit and 160 on 64-bit currently.

And of course, there is a middle ground, where whole fields are only conditionally present, but lengths are always 32-bit:

Part segment:
- Metadata byte
  - Bit 0: has non-default fill char/custom args
  - Bit 2-3: precision count type
  - Bit 4-5: width count type
  - Bit 6-7: alignment
- Next length u32
- 32-bit field for fill char (Unicode values are 21-bit) + flags, if non-default
  - Bit 21: SignPlus
  - Bit 22: SignMinus
  - Bit 23: Alternate
  - Bit 24: SignAwareZeroPad
  - Bit 25: DebugLowerHex
  - Bit 26: DebugUpperHex
- Precision u32, if precision count type is not "implied"
- Width u32, if width count type is not "implied"
- Next data bytes
Arguments static data:
- Part count u32
- Start length u32
- Start data bytes
- Part segment sequence

This results in the following static sizes:

"a{}b{}c" would require 21 bytes static on all platforms, compared to 24 on 32-bit and 48 on 64-bit currently.
"a{}b{:.5}c" would require 25 bytes static on all platforms, compared to 88 on 32-bit and 160 on 64-bit currently.

tgross35 · 2023-10-31T22:07:18Z

@dead-claudia I think most of the brainstorming is at #99012, this PR is for a specific experiment. That being said, the ideas sound great!

One thing is that I don't think we want to use invalid UTF-8 characters to store padding information, see #99012 (comment). Also it seems like it may be a good idea to reserve some bits for future fill options

I don't think there would be any opposition to storing indices as u8, the only downside being the unaligned reads that result.

Are you up for trying this? I'd be interested to see an experiment PR similar to this one that tests out your ideas to see how it stacks up for performance

Emit's considerably more complicated, but it not only trims the runtime size to 2 pointers + arguments stack array for arguments, but also shaves the runtime array sizes so much, it's about the size of a C format string (but much simpler to process).

For anyone's reference, here an older C printf implementation (printf calls vfprintf), rather complex http://sourceware.org/git/?p=glibc.git;a=blob;f=stdio-common/vfprintf.c;h=fc370e8cbc4e9652a2ed377b1c6f2324f15b1bf9;hb=3321010338384ecdc6633a8b032bb0ed6aa9b19a#l196. I'm not entirely sure where the size comes from, tough code to read...

dead-claudia · 2023-11-01T00:14:39Z

@tgross35

@dead-claudia I think most of the brainstorming is at #99012, this PR is for a specific experiment. That being said, the ideas sound great!

Yeah, didn't know about that issue. Thanks!

One thing is that I don't think we want to use invalid UTF-8 characters to store padding information, see #99012 (comment). [...]

To be clear, that character + flags field stores a literal Unicode scalar code point, not a UTF-8 character. It can easily accommodate a byte, and technically speaking, that field for bytes could be shrunk down to 8 bits for the fill character, 8 bits for the flags as it stands currently.

[...] Also it seems like it may be a good idea to reserve some bits for future fill options

There's a whole 5 more that could feasibly fit in that character + flags bit set (4 if you want a "continuation" bit). There's room to grow. 🙂

I don't think there would be any opposition to storing indices as u8, the only downside being the unaligned reads that result.

True, but how often is this really being called? Also, while unaligned reads are slower, this one's already very likely in cache, and so in practice there's likely at most 1 memory bus fetch anyways for most platforms. This is also why this person found little difference in practice - most their accesses were already likely pre-fetched into cache.

With architectures like 32-bit ARM that at best only half-support unaligned access, the relative performance might be close enough to not matter, considering the frequent dynamic function calls and the like. Also, with the compact encoding, most small accesses (where unaligned accesses would have the most impact) would do 8-bit loads anyways, so it'd already be 1-byte aligned in the common case. And this can be bypassed if no fill is needed.

Also, LLVM only generates two loads for misaligned loads, so the impact should at worst be 2-3x worse even on those. I do need to inspect their code gen, though.

Worst case scenario (if it causes perf issues), this structure's easy to tweak to ensure proper padding - just pad the string lengths to compensate. For one, the existing bits should probably be moved even now so it all exists on the same byte, but I obviously didn't think that far when laying that out initially. 🙃

Are you up for trying this? I'd be interested to see an experiment PR similar to this one that tests out your idea

I'd love to, but it'll be a while (few months minimum) before I'd have any real time to play around and try to implement this.

tgross35 · 2023-11-01T01:19:55Z

To be clear, that character + flags field stores a literal Unicode scalar code point, not a UTF-8 character. It can easily accommodate a byte, and technically speaking, that field for bytes could be shrunk down to 8 bits for the fill character, 8 bits for the flags as it stands currently.
[...]
There's a whole 5 more that could feasibly fit in that character + flags bit set (4 if you want a "continuation" bit). There's room to grow. 🙂

Oh, right, good points. I just read Bit 31: width is 32-bit and assumed the space was full 😄

True, but how often is this really being called? Also, while unaligned reads are slower, this one's already very likely in cache, and so in practice there's likely at most 1 memory bus fetch anyways for most platforms.

Oh, I don't have any perf concerns, the APIs for unaligned things are just messier and can overall be a bit more eh (e.g. #[repr(packed)] is generally disliked for a few reasons). But that may not even materialize here.

Are you up for trying this? I'd be interested to see an experiment PR similar to this one that tests out your idea

I'd love to, but it'll be a while (few months minimum) before I'd have any real time to play around and try to implement this.

Too bad :) maybe somebody will be interested in picking it up as an experiment in the meantime

tgross35 · 2023-12-06T04:01:10Z

@dead-claudia you refer to "start data" a few times. What exactly is this? I assume it stores the argument information, but it's not clear to me how it gets processed (perhaps an array of indices that use dynamic data?)

dead-claudia · 2023-12-07T10:06:20Z

@dead-claudia you refer to "start data" a few times. What exactly is this? I assume it stores the argument information, but it's not clear to me how it gets processed (perhaps an array of indices that use dynamic data?)

Sorry, should've explained it then. It's the byte contents of the first string segment. For instance, this was a triumph in format_args!("this was a triumph{}i'm making a note here{}huge success", foo, bar).

Dylan-DPC · 2024-04-02T10:28:06Z

Closing this as inactive. Feel free to reöpen this pr or create a new pr if you get the time to work on this. Thanks

m-ou-se added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. A-fmt Area: `std::fmt` labels Aug 23, 2023

m-ou-se self-assigned this Aug 23, 2023

rustbot added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 23, 2023

m-ou-se mentioned this pull request Aug 23, 2023

Tracking issue for improving std::fmt::Arguments and format_args!() #99012

Open

46 tasks

This comment has been minimized.

Sign in to view

fbstj reviewed Aug 23, 2023

View reviewed changes

library/core/src/fmt/mod.rs Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

m-ou-se force-pushed the fmt-args-new-repr branch from bdcae60 to f8c2d63 Compare August 23, 2023 13:10

This comment has been minimized.

Sign in to view

m-ou-se commented Aug 23, 2023

View reviewed changes

tests/incremental/hashes/inherent_impls.rs Outdated Show resolved Hide resolved

m-ou-se commented Aug 23, 2023

View reviewed changes

tests/mir-opt/inline/inline_into_box_place.main.Inline.panic-abort.diff Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 23, 2023

This comment has been minimized.

Sign in to view

scottmcm reviewed Aug 23, 2023

View reviewed changes

tests/mir-opt/inline/inline_into_box_place.main.Inline.panic-abort.diff Outdated Show resolved Hide resolved

scottmcm reviewed Aug 23, 2023

View reviewed changes

tests/mir-opt/pre-codegen/slice_iter.enumerated_loop.PreCodegen.after.panic-abort.mir Outdated Show resolved Hide resolved

scottmcm reviewed Aug 23, 2023

View reviewed changes

tests/mir-opt/pre-codegen/slice_iter.forward_loop.PreCodegen.after.panic-unwind.mir Outdated Show resolved Hide resolved

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Aug 24, 2023

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 30, 2023

Zoxc mentioned this pull request Sep 5, 2023

Outline formatting code from the panic! macro #115562

Closed

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 14, 2023

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Sep 15, 2023

dtolnay mentioned this pull request Sep 16, 2023

Implement From<&'a &'static str> for Arguments<'a> #114531

Open

cjgillot reviewed Sep 21, 2023

View reviewed changes

Dylan-DPC closed this Apr 2, 2024

Dylan-DPC added S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 2, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New fmt::Arguments representation. #115129

New fmt::Arguments representation. #115129

m-ou-se commented Aug 23, 2023

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

m-ou-se commented Aug 23, 2023

This comment has been minimized.

bors commented Aug 23, 2023

bors commented Aug 23, 2023

This comment has been minimized.

rust-timer commented Aug 24, 2023

m-ou-se commented Aug 24, 2023

m-ou-se commented Aug 24, 2023

bors commented Aug 24, 2023

bors commented Aug 30, 2023

This comment has been minimized.

rust-timer commented Aug 30, 2023

m-ou-se commented Aug 30, 2023

Kobzol commented Sep 14, 2023

This comment has been minimized.

Kobzol commented Sep 15, 2023

bors commented Sep 15, 2023

bors commented Sep 15, 2023

This comment has been minimized.

rust-timer commented Sep 15, 2023

Kobzol commented Sep 17, 2023

cjgillot Sep 21, 2023

cjgillot Sep 21, 2023

m-ou-se Nov 7, 2023

dead-claudia commented Oct 10, 2023 •

edited

Loading

tgross35 commented Oct 31, 2023 •

edited

Loading

dead-claudia commented Nov 1, 2023

tgross35 commented Nov 1, 2023 •

edited

Loading

tgross35 commented Dec 6, 2023

dead-claudia commented Dec 7, 2023

Dylan-DPC commented Apr 2, 2024

New fmt::Arguments representation. #115129

New fmt::Arguments representation. #115129

Conversation

m-ou-se commented Aug 23, 2023

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

m-ou-se commented Aug 23, 2023

This comment has been minimized.

bors commented Aug 23, 2023

bors commented Aug 23, 2023

This comment has been minimized.

rust-timer commented Aug 24, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

m-ou-se commented Aug 24, 2023

m-ou-se commented Aug 24, 2023

bors commented Aug 24, 2023

bors commented Aug 30, 2023

This comment has been minimized.

rust-timer commented Aug 30, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

m-ou-se commented Aug 30, 2023

Kobzol commented Sep 14, 2023

This comment has been minimized.

Kobzol commented Sep 15, 2023

bors commented Sep 15, 2023

bors commented Sep 15, 2023

This comment has been minimized.

rust-timer commented Sep 15, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Kobzol commented Sep 17, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

dead-claudia commented Oct 10, 2023 • edited Loading

tgross35 commented Oct 31, 2023 • edited Loading

dead-claudia commented Nov 1, 2023

tgross35 commented Nov 1, 2023 • edited Loading

tgross35 commented Dec 6, 2023

dead-claudia commented Dec 7, 2023

Dylan-DPC commented Apr 2, 2024

dead-claudia commented Oct 10, 2023 •

edited

Loading

tgross35 commented Oct 31, 2023 •

edited

Loading

tgross35 commented Nov 1, 2023 •

edited

Loading