Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide a means of turning iterators into fixed-size arrays #81615

Open
bstrie opened this issue Feb 1, 2021 · 19 comments
Open

Provide a means of turning iterators into fixed-size arrays #81615

bstrie opened this issue Feb 1, 2021 · 19 comments
Labels
A-array Area: `[T; N]` A-iterators Area: Iterators C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@bstrie
Copy link
Contributor

bstrie commented Feb 1, 2021

Now that min_const_generics is approaching there is a great deal of new support for working with fixed-sized arrays, such as std::array::IntoIter. But while converting from an array to an iterator is now well-supported, the reverse is lacking:

let mut a = std::array::IntoIter::new([1,2,3]);
let b: [i32; 3] = a.collect(); // the trait `FromIterator<{integer}>` is not implemented for `[i32; 3]`

It is roundaboutly possible to do the conversion by going through Vec:

let mut a = std::array::IntoIter::new([1,2,3]);
let b: [i32; 3] = a.collect::<Vec<_>>().try_into().unwrap();

But that isn't no_std compatible, and even with std there should be no need to allocate here.

The non-allocating way of converting the array presupposes a fair bit of familiarity with the stdlib, unsafe code, and unstable features:

#![feature(maybe_uninit_uninit_array)]
#![feature(maybe_uninit_array_assume_init)]
let mut array: [MaybeUninit<T>; N] = MaybeUninit::uninit_array();

for i in 0..N {
    array[i] = MaybeUninit::new(iter.next().unwrap());
}

let array = unsafe {
    MaybeUninit::array_assume_init(array)
};

...which suggests this is a prime candidate for stdlib inclusion.

The first problem is what the signature should be. There's no way to statically guarantee that an iterator is any given length, so (assuming that you don't want to panic) what is the return type when the iterator might be too short?

  1. -> [T; N]: straightforward, but you'd have to have T: Default, which limits its usefulness. Furthermore this uses in-band signaling to mask what is probably an error (passing in a too-small iterator), which feels user-hostile. This seems little better than panicking.
  2. -> [Option<T>; N]: the obvious solution to indicate potentially-missing data, but likely annoying to work with in the success case. And sadly the existing blanket impl that ordinarily allows you .collect from from a collection of options into an option of a collection can't be leveraged here because you still don't have an impl of FromIterator<T> for [T; N]. So if you actually want a [T; N] you're left with manually iterating over what was returned, which is to say, you're no better off than having the iterator you started out with!
  3. -> Option<[T; N]>: the simplest solution that doesn't totally ignore errors. This would be consistent with std::slice::array_windows for producing None when a function cannot construct a fixed-size array due to a too-small iterator. However, it unfortunately seems less quite a bit less recoverable in the failure case than the previous approach.
  4. -> Result<[T; N], U>: same as the previous, though it's possible you could pass something useful back in the error slot. However, as long as the iterator is by-value and unless it's limited to ExactSizeIterators, it might be tricky to pass the data back.
  5. -> ArrayVec<T, N>: this would be a new type designed to have infallible conversion from FromIterator. Actually extracting the fixed-size array would be done through APIs on this type, which avoids some problems in the next section.

IMO approaches 1 and 2 are non-starters.

The second problem is how to perform the conversion.

  1. FromIterator: the obvious approach, however, it cannot be used with return type 3 from the prior section. This is because of the aforementioned blanket impl for collecting a collection-of-options into an option-of-collections, which conflicts with any attempt to impl FromIterator<T> for Option<[T; N]>. I think even specialization can't solve this?
  2. TryFrom: theoretically you could forgo an impl of FromIterator<T> and instead impl TryFrom<I: IntoIterator<Item=T>>; hacky, but at least you're still using some standard conversion type. Sadly, Invalid collision with TryFrom implementation? #50133 makes it impossible to actually write this impl; people claim that specialization could theoretically address that, but I don't know enough about the current implementation to know if it's sufficient for this case.
  3. Introduce a new TryFromIterator trait. This would work, but this also implies a new TryIntoIterator trait and Iterator::try_collect. Likely the most principled approach.
  4. A new std::array::TryCollect trait, impl'd for I: IntoIterator. Less principled than the prior approach but less machinery for if you happen to think TryFromIterator and TryIntoIterator wouldn't be broadly useful.
  5. A new std::array::from_iter function. The simplest and least general approach. Less consistent with .collect than the previous approach, but broadly consistent with std::array::IntoIter (although that itself is considered a temporary stopgap until .into_iter works on arrays). Similarly, could be an inherent associated function impl'd on [T; N].
  6. Come up with some wacky const-generics shenanigans that would allow FromIterator<T> for [T; N] to Just Work, possibly via introducing a new const-generics aware version of ExactSizeIterator. If this could be done it would unquestionably be the most promising way to proceed, but I don't have the faintest idea where to begin.
  7. A new method Iterator::collect_array as a hardcoded alternative to collect. Similarly, an Iterator-specific equivalent of slice::array_chunks could fill the same role while also being potentially more flexible.

Any other suggestions?

@bstrie bstrie added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Feb 1, 2021
@camelid camelid added A-iterators Area: Iterators C-feature-request Category: A feature request, i.e: not implemented / a PR. labels Feb 1, 2021
@the8472
Copy link
Member

the8472 commented Feb 1, 2021

Existing PRs #69985, #75644 and #79659. But It's great to have a place to consolidate discussion since there are so many ways to skin this cat.

Additional options.

  1. Limit to T: Default and fill the array when the iterator falls short. Or let the user provide a lambda to fill them.
  2. a new method on iterators (basically Add Iterator::collect_array method #79659 or variations thereof)
  3. wait for ArrayVec

@bstrie
Copy link
Contributor Author

bstrie commented Feb 1, 2021

let the user provide a lambda to fill them.

This is an interesting alternative, and has precedence with Option::unwrap_or_default (which takes a value, not a lambda). However, it would preclude using any of the existing conversion traits.

Limit to T: Default and fill the array when the iterator falls short.

This alternative is mentioned above. The difficulty of handling the error case means that I wouldn't want this to be the only way to convert an iterator into an array, but I wouldn't be opposed to it existing alongside of another approach.

Thinking about it, I suppose this approach could technically be considered equivalent to the previously-mentioned "let the user provide a [value]" approach, assuming that the user does something like x.into_iter().chain(std::iter::repeat(FOO)).collect(); this would guarantee any would-be "empty" slots in the array would instead be filled in with FOO. A bit clunky, though.

wait for ArrayVec

Can you elaborate on what this is?

@the8472
Copy link
Member

the8472 commented Feb 1, 2021

Can you elaborate on what this is?

Something like

struct ArrayVec<T, N: const usize> {
  len: usize,
  data: [MaybeUninit<T>; N]
}

impl ArrayVec {

   /// may panic
   fn to_ary(self) -> [T; N]  {
      // ...
   }

}

This has been proposed in other conversations as it would help in several places in the standard library where an [T; N] is gradually initialized. And it would also help in this case since we could just return that instead of [T; N] directly. The user could then extract the array when len == N or handle the shorter than expected array in some other way.

It's similar to returning a tuple of a partially initialized array and a usize that tells you how many members have been initialized, except that it's safe to use.

With this we could do iter.collect::<ArrayVec<_, 16>>()

@the8472
Copy link
Member

the8472 commented Feb 1, 2021

#81382 (comment) suggests Iterator::array_windows() if that can be implemented (the storage requirements for the array were in question) then a similar Iterator::array_chunks() should be possible. With that one could call iter.array_chunks().next()

That is actually more general than using FromIterator or TryFrom because it does not consume the iterator and thus allows multiple arrays to be extracted if it has enough elements.

#79659 (comment) also suggests something similar, albeit as an iterator method instead of an iterator adapter.


Come up with some wacky const-generics shenanigans that would allow FromIterator for [T; N] to Just Work, possibly via introducing a new const-generics aware version of ExactSizeIterator. If this could be done it would unquestionably be the most promising way to proceed, but I don't have the faintest idea where to begin.

I'm not sure if that's possible. Iterators are stateful, they can be in a partially exhausted without changing their type. So the remaining number of elements can't be determined at compile time.

@bstrie
Copy link
Contributor Author

bstrie commented Feb 1, 2021

Hm, I'm very intrigued by the suggestion of Iterator::array_chunks as a solution to this. It's usable in the simple case, generalizes to advanced cases, has symmetry with existing APIs, and could sidestep the return type discussion by making it natural to return a None when the iterator no longer has sufficient elements. However, would it actually be possible for this to support a remainder method like array_chunks has in order to handle the case where the iterator length isn't a multiple of the array length? What would the return type be? The hypothetical Iterator::array_windows avoids the problem of having a remainder, but for that you'd need T: Copy. I've added this to the list of suggestions.

I've also added ArrayVec to the list of potential return types.

@amosonn
Copy link
Contributor

amosonn commented Feb 1, 2021

What about Iterator::next_n<N: const usize>? This allows to pull an array, and then keep pulling normally from the iterator. Iterator::array_chunks can be trivially implemented in terms of this.

As for the return type, it is worth noting: Whereas in an array, if some elements are "gone" (e.g. when a None is returned from array_windows), in an iterator this is more of a problem. Maybe something like Result<[T; N], (usize, [MaybeUninit<T>; N])>? (Or, of course, Result<[T; N], ArrayVec<T, N>>, though then it might be easier to just return the ArrayVec).

@the8472
Copy link
Member

the8472 commented Feb 1, 2021

@bstrie

What would the return type be?

Probably std::iter::ArrayChunks if it is an iterator adapter.

However, would it actually be possible for this to support a remainder method like array_chunks has in order to handle the case where the iterator length isn't a multiple of the array length?

Well, that's just the return type question again, but this time for the associated type, i.e. type Item = ???.

If Item = [T; N] then next() then there are two ways to implement this. Either next() just iterates until it can fill the array and
returns that and otherwise throws the partial result away... or it keeps internal state which can be retrieved through an additional method from the adapter. E.g. the number of elements it discarded. Or the elements it discarded inside a Vec<_>, which will never be allocated if your iterator is an exact multiple of N. Or an internal [MaybeUninit<T>; N] buffer to keep the remainder.

Or it could have any of the other discussed result types, e.g. type Item = ArrayVec

@amosonn

What about Iterator::next_n<N: const usize>? This allows to pull an array, and then keep pulling normally from the iterator. Iterator::array_chunks can be trivially implemented in terms of this.

They're actually a bit different, both having advantages and disadvantages.

If next_n() is implemented all the way down to the iterator source then it can be executed without discarding any elements. But that behavior would have to be optional, the default implementation would have to discard elements when there's not enough left to fill the array, at least for things that aren't TrustedLen/ExactSizeIterator/TrustedRandomAccess.

array_chunks() on the other hand could keep state in its adapter and thus make things recoverable. But it's potentially less efficient since it can't just pass entire arrays through the iterator pipeline for each step, and it would probably be larger.

So those should be considered as distinct options.

@bstrie
Copy link
Contributor Author

bstrie commented Feb 1, 2021

I was mistaken about something in the original post; while it is impossible to impl FromIterator<T> for Result<[T; N], Foo> due to the conflicting impl, it's not impossible to impl FromIterator<T> for Result<[T; N], Foo<T>>. I can't say I understand why the latter doesn't conflict, but that does much lessen any need for TryFromIterator from a technical perspective.

@amosonn
Copy link
Contributor

amosonn commented Feb 2, 2021

@the8472

array_chunks() on the other hand could keep state in its adapter and thus make things recoverable.

That's a good point! Getting the rest from the adapter definitely avoids the problem of the complicated signature. The downside is that you can forget to look at it, as opposed to a Result, and also that this pattern does not play well with a for loop.

@usbalbin
Copy link
Contributor

usbalbin commented Feb 2, 2021

There were some ideas in #80094, not sure how/if that would fit in with the current Iterators though...

@leonardo-m 's comment:

Can't we give arrays a different iterator that encodes the length too? :-)

@bstrie
Copy link
Contributor Author

bstrie commented Feb 2, 2021

Can't we give arrays a different iterator that encodes the length too?

I'm not entirely clear what that's suggesting, but it looks like they're asking for an iterator where the length is encoded in the type somehow, which means that the type of the iterator would need to change with every call to next. Rust types don't have the ability to mutate their type, so you'd need a totally separate interface where each call to next returns both a value and a new, smaller iterator. I don't think that helps with this issue.

@the8472
Copy link
Member

the8472 commented Feb 2, 2021

A somewhat tangentional option would be to have a separate iterator source method on the collections. This has the benefit of guaranteed access to the size, so the iterator will know in advance how large the array can be, and the remainder of the elements will remain in the collection.

Similiar to vec.drain() which does not consume the vec itself but does consume elements within the vec. I.e. the iterated elements are not borrows.

Vec::arrays::<N>(&mut self) -> impl Iterator<Item=[T; N]>. You could directly pop off an array via next(). Or you could process it further via a chain vec.arrays::<5>().map(|ary| ary.map(|e| e + 1)).next(). The latter map is the inherent array method, not an iterator one.

This doesn't solve the problem of turning arbitrary iterators into an array, but it directly lets you get arrays out of collections and process them. And it avoids the impedance mismatch of iterator methods such as filter which produce an unknown amount of elements.

@bstrie
Copy link
Contributor Author

bstrie commented Feb 15, 2021

#82098 seeks to add a private helper to the stdlib and might serve as the building block for whatever API is decided upon. Some notes on its design:

  1. It accepts an iterator by-reference. This means it does not need to tackle the question of what to do if the iterator is larger than the destination array. It resolves the problem of a too-small iterator via dropping the yielded elements; there's still theoretically room for an API to do better here.
  2. It returns Option<[T; N]>, but has an unsafe version that returns [T; N] and requires the caller to guarantee the size.

@leonardo-m
Copy link

leonardo-m commented Feb 16, 2021

In my codebase I'm using similar things since many months. So far I've settled to two functions:

  1. A Iterator::collect_array() that returns an array and asserts the input iterator length to be exactly the same length of the result.
  2. And a Iterator::collect_slice_into(&mut dest) function that collects inside a given slice and returns the filled slice, this just asserts that the input iterable isn't longer than the slice (to avoid losing items).

The design space is wide enough, probably there are other reasonable designs (like returning Options instead of asserting, or truncating the iterable ignoring extra items, etc). But those two cover nearly all my use cases.

(I am also using two more functions, a Iterator::unzip_slices(&mut dest1, &mut dest2) similar to Iterator::unzip that returns (without heap allocations) a tuple containing two slices, and a Iterator::collect_arraymap that returns an ArrayMap that's a simple map data structure based on an unordered array of (key, value) pairs. The Rust stdlib could contain some fixed-size data structures).

Dylan-DPC-zz pushed a commit to Dylan-DPC-zz/rust that referenced this issue Feb 20, 2021
…ray, r=dtolnay

Add internal `collect_into_array[_unchecked]` to remove duplicate code

Unlike the similar PRs  rust-lang#69985, rust-lang#75644 and rust-lang#79659, this PR only adds private functions and does not propose any new public API. The change is just for the purpose of avoiding duplicate code.

Many array methods already contained the same kind of code and there are still many array related methods to come (e.g. `Iterator::{chunks, map_windows, next_n, ...}`, `[T; N]::{cloned, copied, ...}`, ...) which all basically need this functionality. Writing custom `unsafe` code for each of those doesn't seem like a good idea. I added two functions in this PR (and not just the `unsafe` version) because I already know that I need the `Option`-returning version for `Iterator::map_windows`.

This is closely related to rust-lang#81615. I think that all options listed in that issue can be implemented using the function added in this PR. The only instance where `collect_array_into` might not be general enough is when the caller want to handle incomplete arrays manually. Currently, if `iter` yields fewer than `N` items, `None` is returned and the already yielded items are dropped. But as this is just a private function, it can be made more general in future PRs.

And while this was not the goal, this seems to lead to better assembly for `array::map`: https://rust.godbolt.org/z/75qKTa (CC `@JulianKnodt)`

Let me know what you think :)

CC `@matklad` `@bstrie`
JohnTitor added a commit to JohnTitor/rust that referenced this issue Feb 22, 2021
…ray, r=dtolnay

Add internal `collect_into_array[_unchecked]` to remove duplicate code

Unlike the similar PRs  rust-lang#69985, rust-lang#75644 and rust-lang#79659, this PR only adds private functions and does not propose any new public API. The change is just for the purpose of avoiding duplicate code.

Many array methods already contained the same kind of code and there are still many array related methods to come (e.g. `Iterator::{chunks, map_windows, next_n, ...}`, `[T; N]::{cloned, copied, ...}`, ...) which all basically need this functionality. Writing custom `unsafe` code for each of those doesn't seem like a good idea. I added two functions in this PR (and not just the `unsafe` version) because I already know that I need the `Option`-returning version for `Iterator::map_windows`.

This is closely related to rust-lang#81615. I think that all options listed in that issue can be implemented using the function added in this PR. The only instance where `collect_array_into` might not be general enough is when the caller want to handle incomplete arrays manually. Currently, if `iter` yields fewer than `N` items, `None` is returned and the already yielded items are dropped. But as this is just a private function, it can be made more general in future PRs.

And while this was not the goal, this seems to lead to better assembly for `array::map`: https://rust.godbolt.org/z/75qKTa (CC ``@JulianKnodt)``

Let me know what you think :)

CC ``@matklad`` ``@bstrie``
@usbalbin
Copy link
Contributor

@bstrie

Can't we give arrays a different iterator that encodes the length too?

I'm not entirely clear what that's suggesting, but it looks like they're asking for an iterator where the length is encoded in the type somehow, which means that the type of the iterator would need to change with every call to next. Rust types don't have the ability to mutate their type, so you'd need a totally separate interface where each call to next returns both a value and a new, smaller iterator. I don't think that helps with this issue.

What about some kind of wrapper type, say IteratorFixed<I, N> that holds an iterator of known length. This wrapper type would provide methods like map, zip and skip<N> etc. as well as collect(self) -> FromFixedIterator and prevent anyone from calling next on the underlying iterator.

My very rough attempt: playground

@usbalbin
Copy link
Contributor

...which would allow things like

let res: [u32; 2] = [1u32, 2, 3, 4]
    .into_iter_fixed()
    .zip([4, 3, 2, 1])
    .map(|(a, b)| a + b)
    .skip::<1>()
    .take::<2>()
    .collect();

keeping the length in the type system removing the need for asserting the length at runtime. Of course this can not support things like filter and other methods that may affect the length during runtime.

Are there many real use cases where those limitations would be problematic when collecting into collections of a fixed size? :)

@bstrie
Copy link
Contributor Author

bstrie commented Feb 28, 2021

Hm, that's an intriguing possibility. On the other hand it doesn't actually solve the issue for users who didn't start out with a fixed-size array in the first place. Which is to say, even if we had such a size-aware array-only iterator, we'd probably still want one of the other solutions in this thread to fill in the gaps. I'd say your proposal is orthogonal to this issue and is worth being considered separately.

Are there many real use cases where those limitations would be problematic when collecting into collections of a fixed size? :)

Agreed that having real-world use cases would be a big help here.

@the8472
Copy link
Member

the8472 commented Mar 15, 2021

There already is a backlink, but just to make it explicit: There's a RFC PR for ArrayVec. If that gets blessed this would at least settle a good chunk of the return type questions.

bors added a commit to rust-lang-ci/rust that referenced this issue Jul 24, 2021
Stabilize `impl From<[(K, V); N]> for HashMap` (and friends)

In addition to allowing HashMap to participate in Into/From conversion, this adds the long-requested ability to use constructor-like syntax for initializing a HashMap:
```rust
let map = HashMap::from([
    (1, 2),
    (3, 4),
    (5, 6)
]);
```
This addition is highly motivated by existing precedence, e.g. it is already possible to similarly construct a Vec from a fixed-size array:
```rust
let vec = Vec::from([1, 2, 3]);
```
...and it is already possible to collect a Vec of tuples into a HashMap (and vice-versa):
```rust
let vec = Vec::from([(1, 2)]);
let map: HashMap<_, _> = vec.into_iter().collect();
let vec: Vec<(_, _)> = map.into_iter().collect();
```
...and of course it is likewise possible to collect a fixed-size array of tuples into a HashMap ([but not vice-versa just yet](rust-lang#81615)):
```rust
let arr = [(1, 2)];
let map: HashMap<_, _> = std::array::IntoIter::new(arr).collect();
```
Therefore this addition seems like a no-brainer.

As for any impl, this would be insta-stable.
@Aplet123
Copy link

Aplet123 commented Feb 14, 2022

I made a potential implementation of approach 4 that passes data back with any type that implements FromIterator<A>. Note that people might not actually want any data back on failure, so there might have to be some accompanying zero-sized type that implements FromIterator to avoid allocation.

Of course, this is probably completely unnecessary once ArrayVec becomes a thing.

eggyal pushed a commit to eggyal/copse that referenced this issue Jan 9, 2023
Stabilize `impl From<[(K, V); N]> for HashMap` (and friends)

In addition to allowing HashMap to participate in Into/From conversion, this adds the long-requested ability to use constructor-like syntax for initializing a HashMap:
```rust
let map = HashMap::from([
    (1, 2),
    (3, 4),
    (5, 6)
]);
```
This addition is highly motivated by existing precedence, e.g. it is already possible to similarly construct a Vec from a fixed-size array:
```rust
let vec = Vec::from([1, 2, 3]);
```
...and it is already possible to collect a Vec of tuples into a HashMap (and vice-versa):
```rust
let vec = Vec::from([(1, 2)]);
let map: HashMap<_, _> = vec.into_iter().collect();
let vec: Vec<(_, _)> = map.into_iter().collect();
```
...and of course it is likewise possible to collect a fixed-size array of tuples into a HashMap ([but not vice-versa just yet](rust-lang/rust#81615)):
```rust
let arr = [(1, 2)];
let map: HashMap<_, _> = std::array::IntoIter::new(arr).collect();
```
Therefore this addition seems like a no-brainer.

As for any impl, this would be insta-stable.
@workingjubilee workingjubilee added the A-array Area: `[T; N]` label Mar 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-array Area: `[T; N]` A-iterators Area: Iterators C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants