Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add method to return the k most common items and speed up most_common_*() methods #25

Merged
merged 3 commits into from
Jul 16, 2022

Commits on Jul 9, 2022

  1. Add method k_most_common_ordered()

    Functionally this method is equivalent to the following:
    
        let mut items = counter.most_common_ordered();
        items.truncate(k);
        items
    
    However, by using a binary heap to keep only the `k` most common items,
    the implementation can be more efficient, often much more so,
    than sorting all the items.
    clint-white committed Jul 9, 2022
    Configuration menu
    Copy the full SHA
    e4f6ed6 View commit details
    Browse the repository at this point in the history
  2. Use unstable sorting algorithm

    There is no reason to preserve the relative order of items that compare
    equal, since that order is already unspecified due to the hashing used
    by the `HashMap`.  The unstable sorting algorithm (quicksort) is
    typically faster than the stable sorting algorithm (timsort) and does
    not require the temporary storage used by the latter.
    
    Benchmarks comparing `Counter::most_common_ordered()` and
    `Counter::k_most_common_ordered(n)`, where `n` is the length of the
    counter, found that the heapsort used by `k_most_common_ordered()` was
    typically faster than the stable sort used by `most_common_ordered()`.
    However, this is no longer the case when switching to the unstable sort;
    the quicksort implementation is typically a little faster than the
    heapsort, so now it is better for `k_most_common_ordered()` to call
    `most_common_ordered()` with `k >= n`.
    clint-white committed Jul 9, 2022
    Configuration menu
    Copy the full SHA
    1dd8222 View commit details
    Browse the repository at this point in the history

Commits on Jul 16, 2022

  1. add a more thorough test of k_most_common_ordered

    I hadn't previously used BinaryHeap::peek_mut, and wasn't sure about
    its semantics when changing the value, so I wanted to ensure that
    we really weren't going to lose any data.
    
    With this test, I am pretty confident in this implementation.
    coriolinus committed Jul 16, 2022
    1 Configuration menu
    Copy the full SHA
    2eeb255 View commit details
    Browse the repository at this point in the history