Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement version-sort for imports in style_edition 2024 #6284

Merged
merged 4 commits into from
Sep 3, 2024

Conversation

ytmimi
Copy link
Contributor

@ytmimi ytmimi commented Aug 16, 2024

The version-sort algorithm is described in the sorting section of the style guide and the tracking issue for this in the r-l/rust repo is rust-lang/rust#123800.

This PR implements the algorithm and applies it to import sorting. One of the original PRs that implement this feature was #3764. In that PR the sort order for type aliases was also updated. I decided to keep this PR focused on import sorting, and I plan to open up follow up PRs to address any other sorting that rustfmt needs to apply to conform with the 2024 style_edition.

Given the nature of changing the sort order there were a lot of imports that needed to change in order to pass our self tests. Those changes have all been broken out into their own commit so you can review those changes in isolation from the implementation work. Overall, I've broken the PR into what I think are logical commits and I think it would be best to review the PR one commit at a time.

r? @calebcartwright

@ytmimi ytmimi force-pushed the version_sort branch 2 times, most recently from fac7545 to 5f773bb Compare August 16, 2024 20:15
@calebcartwright calebcartwright self-assigned this Aug 19, 2024
@calebcartwright
Copy link
Member

also cc @rust-lang/style for awareness as the diff of rustfmt's own imports should demonstrate the effects of the sorting changes

Comment on lines +152 to +153
| (VersionChunk::Str(ca), VersionChunk::Number { source: cb, .. })
| (VersionChunk::Number { source: ca, .. }, VersionChunk::Str(cb)) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's an opportunity to slightly optimize this. When comparing a VersionChunk::Str with a VersionChunk::Number I believe that the number will always get sorted first, but I implemented it this way because of this language in the style guide:

To compare two chunks if both are not numeric, compare them by Unicode character lexicographically

@calebcartwright
Copy link
Member

I'll add that the style team did review the diff of this PR in our last meeting to observe the changes in action, and the diff (the changes to the rustfmt codebase in 7f0d9a2) aligned with our expectations

fn parse_numeric_chunk(
&mut self,
mut chars: std::str::CharIndices<'a>,
) -> Option<VersionChunk<'a>> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No changes requested, but sharing an observation that we've got a lot of overlapping code between parse_numeric_chunk and parse_str_chunk, with the only difference (other than the final return value) being the couple checks in the while loop.

There's probably an opportunity to DRY that up in the future

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking a look. I'll address those merge conflicts and rebase the PR sometime later this week

src/sort.rs Outdated
Comment on lines 351 to 364
let mut input = vec![
"x87", "Z_YWX", "ZY_WX", "ZYW_X", "ZYWX", "ZYWX_", "v010", "v10", "w005s09t",
"w5s009t", "x64", "x86", "x86_32", "x86_64", "x86_128", "v000", "v00", "v0", "v0s",
"v00t", "v0u", "v001", "v01", "v1", "v009", "v09", "v9", "_ZYWX", "u_zzz", "u8", "u16",
"u32", "u64", "u128", "u256", "ua", "usize", "uz",
];
let expected = vec![
"_ZYWX", "Z_YWX", "ZY_WX", "ZYW_X", "ZYWX", "ZYWX_", "u_zzz", "u8", "u16", "u32",
"u64", "u128", "u256", "ua", "usize", "uz", "v000", "v00", "v0", "v0s", "v00t", "v0u",
"v001", "v01", "v1", "v009", "v09", "v9", "v010", "v10", "w005s09t", "w5s009t", "x64",
"x86", "x86_32", "x86_64", "x86_128", "x87",
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to add a test cases that covers the full set of items from the example in the Style Guide, just for thoroughness:


    _ZYXW
    _abcd
    A2
    ABCD
    Z_YXW
    ZY_XW
    ZY_XW
    ZYXW
    ZYXW_
    a1
    abcd
    u_zzz
    u8
    u16
    u32
    u64
    u128
    u256
    ua
    usize
    uz
    v000
    v00
    v0
    v0s
    v00t
    v0u
    v001
    v01
    v1
    v009
    v09
    v9
    v010
    v10
    w005s09t
    w5s009t
    x64
    x86
    x86_32
    x86_64
    x86_128
    x87
    zyxw

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the example in the docs changed since I added the test. I've updated it to use the complete example from the guide.

src/sort.rs Outdated
/// Represents a chunk in the version-sort algorithm
#[derive(Debug, PartialEq, Eq)]
enum VersionChunk<'a> {
/// A singel `_` in an identifier. Underscores are sorted before all other characters.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit but typo

Suggested change
/// A singel `_` in an identifier. Underscores are sorted before all other characters.
/// A single `_` in an identifier. Underscores are sorted before all other characters.

@calebcartwright
Copy link
Member

This looks good to me, and I'd be good merging with the extra test case I mentioned inline (it's probably overkill but seems like a good idea to me).

Two other things I'll note while I'm thinking about them:

  • the algorithm does mention spaces relative to underscores (noted below), but i presume that's the type of thing you were alluding to a follow up PR in the description?
  • now that style_edition is in place, we could add a standard system/idempotence test, and perhaps that would be a good way of adding a test that covers the full example mentioned in the style guide (as opposed to another unit test as i suggested inline)?

_ (underscore) sorts immediately after (space) but before any other character. (This ...

@calebcartwright
Copy link
Member

oh and I was also wondering if it would be worthwhile to try running this against some codebase that would have idents that would more thoroughly exercise the algorithm (e.g. we don't have the types of numerical, leading zeroes, etc. use cases the algorithm prescribes within the rustfmt source)

@ytmimi
Copy link
Contributor Author

ytmimi commented Sep 3, 2024

the algorithm does mention spaces relative to underscores (noted below), but i presume that's the type of thing you were alluding to a follow up PR in the description?

As the algorithm is implemented in this PR it only really applies to identifiers, which shouldn't be allowed to contain spaces. For that reason I didn't worry too much about space handling, and I don't think there would be scenarios where we'd need to worry about sorting things with spaces, but correct me if I'm wrong.

When it comes to follow up PRs I was mostly talking about applying the version sort to other AST nodes. I know #3764 also applied sorting to associated items, but I wasn't sure if there were others that we needed to consider as well.


now that style_edition is in place, we could add a standard system/idempotence test, and perhaps that would be a good way of adding a test that covers the full example mentioned in the style guide (as opposed to another unit test as i suggested inline)?

For now I've updated the unit test with the example from the style guide, and I've also added an idempotence test in a0f80f3.


oh and I was also wondering if it would be worthwhile to try running this against some codebase that would have idents that would more thoroughly exercise the algorithm (e.g. we don't have the types of numerical, leading zeroes, etc. use cases the algorithm prescribes within the rustfmt source)

I think this could be worthwhile. I Tried running a Diff-Check when passing style_edition=2024 to see if any of the numerical ident sorting would show up, but the logs are pretty noisy, so I'm not 100% sure if that was the best idea. If you know of any crates that would be good to test against let me know and I'd be happy to test these changes on those projects.

@calebcartwright
Copy link
Member

I think this could be worthwhile. I Tried running a Diff-Check when passing style_edition=2024 to see if any of the numerical ident sorting would show up, but the logs are pretty noisy, so I'm not 100% sure if that was the best idea. If you know of any crates that would be good to test against let me know and I'd be happy to test these changes on those projects.

Yeah that was the concern we had in our style meeting as well, that blasting it against a large, general corpus of code would be highly noisy. No need to block on this, but maybe we can get a call to action out there in some way, as it would be nice to find a good codebase that would have minimal noise

The algorithm is described in the [style guide] and was introduced in
`r-l/rust 115046`.

[style guide]: https://doc.rust-lang.org/nightly/style-guide/#sorting
The version-sort algorithm makes changes to the sort order of imports,
and we're updating them here for the self tests.
This was one of the original issues filed against rustfmt that discussed
using a natural sort order instead of a lexicographical order. At the
time it was closed, but now that we're supporting version-sorting in
`style_edition=2024` I thought it would be good to add a test case.
@calebcartwright
Copy link
Member

For that reason I didn't worry too much about space handling, and I don't think there would be scenarios where we'd need to worry about sorting things with spaces, but correct me if I'm wrong.

Agreed, I think that was included to cover bases algorithmically


This looks good to me, i've done a rebase so we can proceed with a merge. If you see CI go green before me then feel free to merge, otherwise I'll do it later today

@ytmimi ytmimi merged commit 06e1644 into rust-lang:master Sep 3, 2024
26 checks passed
@ytmimi ytmimi deleted the version_sort branch September 3, 2024 19:21
@ytmimi ytmimi added the release-notes Needs an associated changelog entry label Sep 3, 2024
@ytmimi ytmimi removed the release-notes Needs an associated changelog entry label Sep 20, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Sep 23, 2024
Bump stage0 to beta-2024-09-22 and rustfmt to nightly-2024-09-22

I'm doing this to apply the changes to version sorting (rust-lang/rustfmt#6284) that have occurred since rustfmt last upgraded (and a few other miscellaneous changes, like changes to expression overflowing: rust-lang/rustfmt#6260). Eagerly updating rustfmt and formatting-the-world will ideally move some of the pressure off of the beta bump which will happen at the beginning of the next release cycle.

You can verify this is correct by checking out the changes, reverting the last commit, reapplying them, and diffing the changes:

```
git fetch [email protected]:compiler-errors/rust.git bump
git checkout -b bump FETCH_HEAD
git reset --hard HEAD~5
./x.py fmt --all
git diff FETCH_HEAD
# ignore the changes to stage0, and rustfmt.toml,
# and test file changes in rustdoc-js-std, run-make.
```

Or just take my word for it? Up to the reviewer.

r? release
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
a-2024-edition Style Edition 2024
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants