Spec list-pattern on enumerable collections #4575

jcouv · 2021-03-23T04:47:45Z

Tagging @dotnet/roslyn-compiler @alrz @MadsTorgersen for review.

Relates to proposal #3435

jcouv · 2021-03-23T08:05:08Z

~~I need to tweak this a bit more, to be able to share the data structure between two branches where the staring patterns in one may overlap with the ending patterns in another:~~

Fixed to handle scenarios where an element needs to be saved both in the start buffer and the end buffer.

collection switch
{
   { 1, 2, 3, .., 5 } => ... // causes start buffer to hold 3 elements
   {1,  .., 3, 4, 5 } => ... // causes end buffer to hold 3 elements
}

alrz · 2021-03-23T16:13:06Z

We could possibly ditch the separate start buffer if we disallow element match after a slice pattern.

case { .., 1, 2 }:
case { 1, 2, .. }: // subsumption error

case { 1, 2, .. }:
case { .., 1, 2 }: // ok

Note that we need to start buffering from the beginning anyways to facilitate forthcoming trailing matches but we only need to maintain a single buffer for that.

This might be too restrictive but I'm not sure how common this is to worth the added complexity.

alrz · 2021-03-23T16:36:01Z

That said, I'd prefer we have a separate helper for skipping elements,

(these may be not the precise signatures we need, it's just a guess based on the prototype)

@{
  // bufferSize: computed from the max number of trailing patterns in all arms
  var helper = new ListPatternHelper(collection, bufferSize); 

  // leading patterns
  helper.TryGetNext(0, out var element0) && ...

  // slice pattern; either:
  helper.SkipToEnd() // always succeeds
  helper.TrySkipToEnd(minLength)
  helper.TrySkipToEnd(maxLength)
  helper.TrySkipToEnd(minLength, maxLength)

  // trailing patterns
  helper.TryGetLast(2, out var hatElement2) && ..

  // length pattern; either:
  helper.TryGetCount(out var count)
  helper.TryGetCount(minLength, out var count)
  helper.TryGetCount(maxLength, out var count)
  helper.TryGetCount(minLength, maxLength, out var count)
}

For TryGetCount, we could possibly optimize a length pattern without binding e.g. [<N] to not iterate past the required length.

jcouv · 2021-03-23T16:53:37Z

For TryGetCount, we could possibly optimize a length pattern without binding e.g. [<N] to not iterate past the required length.

I think that's going to be difficult. Patterns currently rely on having a ready value as input.

// slice pattern; ...
helper.SkipToEnd() // always succeeds

I've made a small change along those lines. A slice pattern with some following ending patterns can call and check Count() once. That way we don't need to do redundant count checks on each ending patterns.

alrz · 2021-03-23T17:15:06Z

proposals/list-patterns.md

+class ListPatternHelper
+{
+ // Notes: 
+ // We could inline this logic to avoid creating a new type and to handle the pattern-based enumeration scenarios.


I think this is a pretty serious issue. So far, I tried to not depend on the buffer type for iteration exactly because of it.
This would be useful to reduce code size, but we should design around pattern-based case if we want to support it.

proposals/list-patterns.md


-A *slice_pattern* is compatible with any type that is *countable* as well as *sliceable* - it has an accessible indexer that takes a `Range` argument or otherwise an accessible `Slice` method that takes two `int` arguments. If both are present, the former is preferred.
+A *slice_pattern* is compatible with any type that is *countable* as well as *sliceable* — it has an accessible indexer that takes a `Range` argument or otherwise an accessible `Slice` method that takes two `int` arguments. If both are present, the former is preferred. 
+A *slice_pattern* without a sub_pattern is also compatible with any type that is *enumerable*.


proposals/list-patterns.md

+If the collection does not produce enough elements to get a value corresponding to a starting pattern, the match fails. So the *constant_pattern* `3` in `{ 1, 2, 3, .. }` doesn't match when the collection has fewer than 3 elements. 
+Patterns at the end of the *list_pattern* (that are following the `..` *slice_pattern* if one is present) are matched against the elements produced at the end of the enumeration. 
+If the collection does not produce enough elements to get values corresponding to the ending patterns, the *splice_pattern* does not match. So the *splice_pattern* in `{ 1, .., 3 }` doesn't match when the collection has fewer than 2 elements. 
+A *list_pattern* without a *splice_pattern* only matches if the number of elements produced by complete enumeration and the number of patterns are equals. So `{ _, _, _ }` only matches when the collection produces exactly 3 elements.


alrz · 2021-03-24T22:07:01Z

proposals/list-patterns.md

+> **Open question**: Confirm that async enumerables are out-of-scope. 
+> **Open question**: Confirm that slice patterns with a sub_pattern (such as `..var x`) are out-of-scope. 
+
+Although a helper type is not necessary, it helps simplify and illustrate the logic.


I gather the following is not the rigorous codegen spec and we may make changes as we go through the impl?

Right.

This rough codegen was mostly to convince myself that it would fit in the DAG binding design (evaluation steps, test steps, ...). Overall I expect it will fit okay.

The parts I'm not sure yet:

how do we account for evaluation steps returning a value or not? (in contrast, property accesses always succeed)

who is responsible for the end-of-enumeration count check (when exiting the list-pattern, like { 1, 2 })?

can we offer enough guarantees in the DAG binding that we don't need to cache starting values at all? (those values would get cached into temps if they are enumeratoed in the right order)

is there enough helper logic that we could extract to a BCL type?

how do we keep track of enumerators to dispose?

how do we account for evaluation steps returning a value or not?

We either use an evaluation followed by a standard test, or a dedicated test node. As long as these won't affect subsumption checking, we can do away with simple eval nodes. In the prototype, I tried the first approach initially, but with trailing patterns, had to introduce a test node to disallow MoveNext after a slice which can happen in multi-arm matches. (This is the key to not requiring a buffer for starting elements, see the example above)

who is responsible for the end-of-enumeration count check

If the enumerator has a length (e.g. via TryGetNonEnumeratedCount), we check that first for an early failure, otherwise we only count if there's a length pattern. This happens right after we enumerate the sequence to the end. While we're doing it, we don't want to exceed the maxLength (if specified) and after that, we want to ensure we've reached the minLength (either inferred or specified). This covers the "early failure" that was discussed in the last LDM.

Note that a simple helper.Count() is P always enumerate the sequence to the end which is not what we want, therefore we need to compute the actual value set that we're testing for. Since we operate on a range of values, something like [1 or 3] causes incorrect results and should be gated.

can we offer enough guarantees in the DAG binding that we don't need to cache starting values at all? (those values would get cached into temps if they are enumerated in the right order)

That is correct. Starting patterns won't need any kind of buffer since we have a temp per each.

is there enough helper logic that we could extract to a BCL type?

I think if we use a standard buffer type we can indeed propose to it BCL. Currently I used a generic fixed-size stack with no additional logic.

how do we keep track of enumerators to dispose?

Since we can't "join" leaf nodes in a DAG, we just wrap the "rest" of the lowering in a try/finally, starting at each GetEnumerator. As a consequence enumerators might stack up. For instance, in { enum1: {0}, enum2: {0} }, both enumerators won't get disposed until after the whole pattern is executed.

proposals/list-patterns.md

alrz · 2021-03-24T22:14:45Z

proposals/list-patterns.md

+If the collection does not produce enough elements to get values corresponding to the ending patterns, the *splice_pattern* does not match. So the *splice_pattern* in `{ 1, .., 3 }` doesn't match when the collection has fewer than 2 elements. 
+A *list_pattern* without a *splice_pattern* only matches if the number of elements produced by complete enumeration and the number of patterns are equals. So `{ _, _, _ }` only matches when the collection produces exactly 3 elements.
+
+Note that those implicit checks for number of elements in the collection are unaffected by the collection type being *countable*. So `{ _, _, _ }` will not make use of `Length` or `Count` even if one is available.


I think if the type is countable we can just use it e.g. if we have Length but not an indexer, we match Length while enumerating for elements.

My thinking was that if we're enumerating to the end anyways, we might as well use the enumerated count that we've accumulated for the check at the end of the list-pattern.
But it's true that we could omit tailing discard patterns. So { 1, _, _ } could be "check first element, check Count == 3".

On the other hand, imagine that the Count property needs to enumerate again from the start.

I'll add an open issue.

"check first element, check Count == 3".

We always check Count first if it's available without enumerating, subsequently MoveNext() && Current is 1 && MoveNext() && MoveNext() && !MoveNext() will be emitted to match elements which also tests for the length in itself.

That pattern is equivalent to [3] { 1, .. } but the codegen would be different as we generate a loop instead to test the length in which case we won't enumerate past the 4th element.

If we're going to emit 4 MoveNext evaluations and enumerate completely, then there is no need to also call Count/Length. We don't need to check the count twice.

We dont need to do it but if we can get count without enumerating we'll check it first to fail as early as possible if it doesn't match. This is from last LDM.

For example, the runtime just approved a new API for TryGetNonEnumeratedCount, and in order to make the pattern fast we could attempt to use it, then fall back to a state-machine-based approach if the collection must be iterated. This would give us the best of both worlds: If the enumerable is actually backed by a concrete list type, we don't need to do any enumeration of the enumerable to check the length pattern. If it's not, we can fall back to the state machine, which can do a more efficient enumeration while checking subpatterns than we could expose as an API from the BCL.

For the state machine fallback, we want to be as efficient as possible. This means not enumerating twice, and bailing out as soon as possible. So, the pattern enumerable [< 6] { 1, 2, 3, .., 10 } can immediately return false if it gets to more than 6 elements, or if any of the first 3 elements don't match the supplied patterns.

https:/dotnet/csharplang/blob/master/meetings/2021/LDM-2021-02-03.md#list-patterns-on-ienumerable

What I'm saying is that when there is a count check for the closing brace } of the list-pattern on an enumerable, that should use the enumerated count, since we'll have enumerated anyways. Similarly, the check that occurs to confirm that we had at least 3 elements before applying the 3 would use the enumeration.

I think that in your framing, all I'm saying is that before we check whether pattern 3 matches, we must have had 3 successful MoveNext() calls and that at the closing brace } we check that !MoveNext(). Those checks just rely on the enumerator, there was no need to call Count on a countable type.

If the enumerable is actually backed by a concrete list type, we don't need to do any enumeration of the enumerable to check the length pattern.

Yes, for the length-pattern (such as [<6]) it is fine to introduce a third concept of count, a "try-count", or extend the concept of the Count API, but that's orthogonal to what I'm saying about the list-pattern.
Note that the { 1, 2, 3, .., 10 } does not have an implicit end-of-list-pattern count, since it has a ... But it does have implicit checks that we could get at least 4 elements before we apply the pattern 10.

I hope that makes sense.
By the way, if you're already down the implementation path and feel there is a more natural way to spec this, feel free to update this doc. I'll merge as soon as I get a sign-off so you can stack a PR.

Sorry, one more point of clarification.
This (Note that those implicit checks for number of elements in the collection are unaffected by the collection type being *countable*. So { _, _, _ }will not make use ofLengthorCount even if one is available.) is merely calling out a consequence of what is stated in the preceding paragraph:

If the collection does not produce enough elements to get a value corresponding to a starting pattern, the match fails. [...]

If the collection does not produce enough elements to get values corresponding to the ending patterns, the slice_pattern does not match. [...]

A list_pattern without a slice_pattern only matches if the number of elements produced by complete enumeration and the number of patterns are equals. [...]

alrz · 2021-03-24T22:20:06Z

proposals/list-patterns.md

+ { 1 } => /* here */ ...,
+ _ => /* here */ ...,
+};
+/* here too, with a spilled try/finally around the switch expression */


This isn't quite clear to me. DAG lowering happens in its own block, so we only need one try/finally per enumerator, (provided we have a single node per each after simplification), from there we emit jumps to the target code section.

You're right. I'm not sure how to best represent this. What I'm trying to illustrate is that we dispose the enumerators as early as possible and in every case (even if an exception is thrown somewhere in a pattern evaluation).

What I'm trying to illustrate is that we dispose the enumerators as early as possible and in every case

As I explained above, the earliest we could do that is at the end of the DAG lowering. At least that's what I could think of.

alrz · 2021-03-24T22:42:05Z

To facilitate early failures, we calculate max/min length based on the pattern itself and the length pattern so that we can check as we iterate. Therefore, we require this set to be contiguous (e.g. [1 or 3] is an error on enumerables). I think we should mention that in the spec.

jcouv · 2021-03-24T23:55:03Z

we require this set to be contiguous (e.g. [1 or 3] is an error on enumerables

Two concerns:

The evaluation model for patterns is "get a value" followed by "check whether the value matches the pattern". But what you're proposing is that we have special understanding of some patterns inside of length-patterns so that we can check before we have the (final) value.
From a language perspective, why place such a restriction? The [1 or 3] pattern seems fine to me.

Maybe we could treat early termination as a compiler optimization (when possible) rather than something the language defines.

I'll add that as an open question.

jcouv · 2021-03-25T16:44:26Z

@dotnet/roslyn-compiler for review.

333fred · 2021-03-25T18:23:43Z

proposals/list-patterns.md

+ {
+ count = 0;
+ enumerator = enumerable.GetEnumerator();
+ startBuffer = startPatternsCount == 0 ? null : new ElementType[startPatternsCount];


We might want to consider renting arrays for this. #Resolved

And having an IDisposable implementation for returning them.

Right. Since we'll need to track disposal of enumerator we have some options here.
Stackalloc might be an option too if we inline this logic (then the state is just locals to the expression).

In reply to: 601740384 [](ancestors = 601740384)

I don't think we'll want to use stackalloc (unconditionally, anyway). This might be a place where we'd like a runtime helper like dotnet/runtime#25423.

333fred · 2021-03-25T21:37:43Z

proposals/list-patterns.md

+@{
+ var helper = new ListPatternHelper(collection, 0, 0);
+
+ helper.Count() == 3


This does not feel good. We probably want to take a parameter of some kind an avoid enumerating the whole enumerable if we pass an upper limit, 3 in this case. #Resolved

In the current design of patterns in general, we get a value then we check the value. I've added an open question on length-pattern cutting enumerations short (checking non-final value in some way), following Ali's suggestion. #Resolved

333fred · 2021-03-25T21:37:55Z

proposals/list-patterns.md

+
+ helper.TryGetStartElement(index: 0, out var element0) && element0 is 0 &&
+ helper.TryGetStartElement(1, out var element1) && element1 is 1 &&
+ helper.Count() == 2


Same comment about enumerable length. #Pending

Thanks. We only need to check we're at the end of the enumeration.

333fred · 2021-03-25T21:39:04Z

proposals/list-patterns.md

+@{
+ var helper = new ListPatternHelper(collection, 0, 2);
+
+ helper.Count() >= 2 && // `..` with 2 ending patterns


This one isn't so bad, since we need enumerate the whole thing anyway. #Resolved

333fred

LGTM (commit 11). We'll keep whacking on that enumeration question in the implementation.

Spec list-pattern on enumerable collections

0907851

jcouv self-assigned this Mar 23, 2021

jcouv requested a review from a team as a code owner March 23, 2021 04:47

tweaks

a6386a3

jcouv marked this pull request as draft March 23, 2021 07:59

jcouv marked this pull request as ready for review March 23, 2021 08:22

jcouv added 2 commits March 23, 2021 01:23

Fix scenario where element needs to be in both buffers

f9f5ece

factor more

70904fd

Factor count checks after splice_pattern

1de09f3

alrz reviewed Mar 23, 2021

View reviewed changes

typo

fff6093

alrz reviewed Mar 24, 2021

View reviewed changes

jcouv added 4 commits March 24, 2021 17:00

Address feedback

a5561b1

typo

bbe54a1

spelling

5779005

Add open question on TryGetNonEnumeratedCount API

c00b4a2

333fred reviewed Mar 25, 2021

View reviewed changes

Don't force enumeration to check enumeration is complete

df51537

333fred approved these changes Mar 25, 2021

View reviewed changes

jcouv merged commit 47143d1 into dotnet:main Mar 25, 2021

jcouv deleted the enumerable-pattern branch March 25, 2021 23:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec list-pattern on enumerable collections #4575

Spec list-pattern on enumerable collections #4575

jcouv commented Mar 23, 2021

jcouv commented Mar 23, 2021 •

edited

Loading

alrz commented Mar 23, 2021 •

edited

Loading

alrz commented Mar 23, 2021 •

edited

Loading

jcouv commented Mar 23, 2021

alrz Mar 23, 2021

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

alrz Mar 24, 2021

jcouv Mar 24, 2021

alrz Mar 25, 2021 •

edited

Loading

alrz Mar 24, 2021

jcouv Mar 24, 2021

alrz Mar 25, 2021 •

edited

Loading

jcouv Mar 25, 2021

alrz Mar 25, 2021 •

edited

Loading

jcouv Mar 25, 2021 •

edited

Loading

jcouv Mar 25, 2021

alrz Mar 24, 2021 •

edited

Loading

jcouv Mar 24, 2021

alrz Mar 25, 2021

alrz commented Mar 24, 2021 •

edited

Loading

jcouv commented Mar 24, 2021 •

edited

Loading

jcouv commented Mar 25, 2021

333fred Mar 25, 2021 •

edited by jcouv

Loading

333fred Mar 25, 2021

jcouv Mar 25, 2021

333fred Mar 25, 2021

333fred Mar 25, 2021 •

edited by jcouv

Loading

jcouv Mar 25, 2021 •

edited

Loading

333fred Mar 25, 2021 •

edited by jcouv

Loading

jcouv Mar 25, 2021

333fred Mar 25, 2021 •

edited by jcouv

Loading

333fred left a comment

Spec list-pattern on enumerable collections #4575

Spec list-pattern on enumerable collections #4575

Conversation

jcouv commented Mar 23, 2021

jcouv commented Mar 23, 2021 • edited Loading

alrz commented Mar 23, 2021 • edited Loading

alrz commented Mar 23, 2021 • edited Loading

jcouv commented Mar 23, 2021

Choose a reason for hiding this comment

This comment was marked as resolved.

This comment was marked as resolved.

This comment was marked as resolved.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alrz Mar 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alrz Mar 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alrz Mar 25, 2021 • edited Loading

Choose a reason for hiding this comment

jcouv Mar 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alrz Mar 24, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alrz commented Mar 24, 2021 • edited Loading

jcouv commented Mar 24, 2021 • edited Loading

jcouv commented Mar 25, 2021

333fred Mar 25, 2021 • edited by jcouv Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

333fred Mar 25, 2021 • edited by jcouv Loading

Choose a reason for hiding this comment

jcouv Mar 25, 2021 • edited Loading

Choose a reason for hiding this comment

333fred Mar 25, 2021 • edited by jcouv Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

333fred Mar 25, 2021 • edited by jcouv Loading

Choose a reason for hiding this comment

333fred left a comment

Choose a reason for hiding this comment

jcouv commented Mar 23, 2021 •

edited

Loading

alrz commented Mar 23, 2021 •

edited

Loading

alrz commented Mar 23, 2021 •

edited

Loading

alrz Mar 25, 2021 •

edited

Loading

alrz Mar 25, 2021 •

edited

Loading

alrz Mar 25, 2021 •

edited

Loading

jcouv Mar 25, 2021 •

edited

Loading

alrz Mar 24, 2021 •

edited

Loading

alrz commented Mar 24, 2021 •

edited

Loading

jcouv commented Mar 24, 2021 •

edited

Loading

333fred Mar 25, 2021 •

edited by jcouv

Loading

333fred Mar 25, 2021 •

edited by jcouv

Loading

jcouv Mar 25, 2021 •

edited

Loading

333fred Mar 25, 2021 •

edited by jcouv

Loading

333fred Mar 25, 2021 •

edited by jcouv

Loading