-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec list-pattern on enumerable collections #4575
Changes from 5 commits
0907851
a6386a3
f9f5ece
70904fd
1de09f3
fff6093
a5561b1
bbe54a1
5779005
c00b4a2
df51537
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -73,16 +73,67 @@ Notes: | |
|
||
#### Pattern compatibility | ||
|
||
A *length_pattern* is compatible with any type that is *countable* - it has an accessible property getter that returns an `int` and has the name `Length` or `Count`. If both properties are present, the former is preferred. | ||
A *length_pattern* is compatible with any type that is *countable* — it has an accessible property getter that returns an `int` and has the name `Length` or `Count`. If both properties are present, the former is preferred. | ||
A *length_pattern* is also compatible with any type that is *enumerable* — it can be used in `foreach`. | ||
|
||
A *list_pattern* is compatible with any type that is *countable* as well as *indexable* - it has an accessible indexer that takes an `Index` or `int` argument. If both indexers are present, the former is preferred. | ||
A *list_pattern* is compatible with any type that is *countable* as well as *indexable* — it has an accessible indexer that takes an `Index` or `int` argument. If both indexers are present, the former is preferred. | ||
A *list_pattern* is also compatible with any type that is *enumerable*. | ||
|
||
A *slice_pattern* is compatible with any type that is *countable* as well as *sliceable* - it has an accessible indexer that takes a `Range` argument or otherwise an accessible `Slice` method that takes two `int` arguments. If both are present, the former is preferred. | ||
A *slice_pattern* is compatible with any type that is *countable* as well as *sliceable* — it has an accessible indexer that takes a `Range` argument or otherwise an accessible `Slice` method that takes two `int` arguments. If both are present, the former is preferred. | ||
A *slice_pattern* without a sub_pattern is also compatible with any type that is *enumerable*. | ||
|
||
``` | ||
enumerable is { 1, 2, .. } // okay | ||
enumerable is { 1, 2, ..var x } // error | ||
``` | ||
|
||
This set of rules is derived from the [***range indexer pattern***](https:/dotnet/csharplang/blob/master/proposals/csharp-8.0/ranges.md#implicit-index-support) but relaxed to ignore optional or `params` parameters, if any. | ||
|
||
> **Open question**: We should define the exact binding rules for any of these members and decide if we want to diverge from the range spec. | ||
|
||
#### Semantics on enumerable type | ||
|
||
If the input type is *enumerable* but not *countable*, then the *length_pattern* is checked on the number of elements obtained from enumerating the collection. | ||
|
||
If the input type is *enumerable* but not *indexable*, then the *list_pattern* enumerates elements from the collection and checks them against the listed patterns: | ||
Patterns at the start of the *list_pattern* — that are before the `..` *slice_pattern* if one is present, or all otherwise — are matched against the elements produced at the start of the enumeration. | ||
If the collection does not produce enough elements to get a value corresponding to a starting pattern, the match fails. So the *constant_pattern* `3` in `{ 1, 2, 3, .. }` doesn't match when the collection has fewer than 3 elements. | ||
Patterns at the end of the *list_pattern* (that are following the `..` *slice_pattern* if one is present) are matched against the elements produced at the end of the enumeration. | ||
If the collection does not produce enough elements to get values corresponding to the ending patterns, the *splice_pattern* does not match. So the *splice_pattern* in `{ 1, .., 3 }` doesn't match when the collection has fewer than 2 elements. | ||
A *list_pattern* without a *splice_pattern* only matches if the number of elements produced by complete enumeration and the number of patterns are equals. So `{ _, _, _ }` only matches when the collection produces exactly 3 elements. | ||
This comment was marked as resolved.
Sorry, something went wrong. |
||
|
||
Note that those implicit checks for number of elements in the collection are unaffected by the collection type being *countable*. So `{ _, _, _ }` will not make use of `Length` or `Count` even if one is available. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think if the type is countable we can just use it e.g. if we have Length but not an indexer, we match Length while enumerating for elements. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My thinking was that if we're enumerating to the end anyways, we might as well use the enumerated count that we've accumulated for the check at the end of the list-pattern. On the other hand, imagine that the I'll add an open issue. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We always check Count first if it's available without enumerating, subsequently That pattern is equivalent to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we're going to emit 4 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We dont need to do it but if we can get count without enumerating we'll check it first to fail as early as possible if it doesn't match. This is from last LDM.
https:/dotnet/csharplang/blob/master/meetings/2021/LDM-2021-02-03.md#list-patterns-on-ienumerable There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What I'm saying is that when there is a count check for the closing brace I think that in your framing, all I'm saying is that before we check whether pattern
Yes, for the length-pattern (such as I hope that makes sense. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry, one more point of clarification.
|
||
|
||
When multiple *list_patterns* are applied to one input value the collection will be enumerated once at most: | ||
``` | ||
_ = collection switch | ||
{ | ||
{ 1 } => ..., | ||
{ 2 } => ..., | ||
{ .., 3 } => ..., | ||
}; | ||
|
||
_ = collectionContainer switch | ||
{ | ||
{ Collection: { 1 } } => ..., | ||
{ Collection: { 2 } } => ..., | ||
{ Collection: { .., 3 } } => ..., | ||
}; | ||
``` | ||
|
||
It is possible that the collection will not be completely enumerated. For example, if one of the patterns in the *list_pattern* doesn't match or when there are no ending patterns in a *list_pattern* (e.g. `collection is { 1, 2, .. }`). | ||
|
||
If an enumerator is produced when a *list_pattern* is applied to an enumerable type and that enumerator is disposable it will be disposed when a top-level pattern containing the *list_pattern* successfully matches, or when none of the patterns match (in the case of a `switch` statement or expression). It is possible for an enumerator to be disposed more than once and the enumerator must ignore all calls to `Dispose` after the first one. | ||
``` | ||
// any enumerator used to evaluate this switch statement is disposed at the indicated locations | ||
_ = collection switch | ||
{ | ||
{ 1 } => /* here */ ..., | ||
_ => /* here */ ..., | ||
}; | ||
/* here too, with a spilled try/finally around the switch expression */ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This isn't quite clear to me. DAG lowering happens in its own block, so we only need one try/finally per enumerator, (provided we have a single node per each after simplification), from there we emit jumps to the target code section. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're right. I'm not sure how to best represent this. What I'm trying to illustrate is that we dispose the enumerators as early as possible and in every case (even if an exception is thrown somewhere in a pattern evaluation). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
As I explained above, the earliest we could do that is at the end of the DAG lowering. At least that's what I could think of. |
||
``` | ||
|
||
#### Subsumption checking | ||
|
||
Subsumption checking works just like [positional patterns with `ITuple`](https:/dotnet/csharplang/blob/main/proposals/csharp-8.0/patterns.md#positional-pattern) - corresponding subpatterns are matched by position plus an additional node for testing length. | ||
|
@@ -103,7 +154,7 @@ The order in which subpatterns are matched at runtime is unspecified, and a fail | |
|
||
> **Open question**: The pattern `{..}` tests for `expr.Length >= 0`. Should we omit such test (assuming `Length` is always non-negative)? | ||
|
||
#### Lowering | ||
#### Lowering on countable/indexeable/sliceable type | ||
|
||
A pattern of the form `expr is {1, 2, 3}` is equivalent to the following code (if compatible via implicit `Index` support): | ||
```cs | ||
|
@@ -121,6 +172,141 @@ expr.Length is >= 2 | |
``` | ||
The *input type* for the *slice_pattern* is the return type of the underlying `this[Range]` or `Slice` method with two exceptions: For `string` and arrays, `string.Substring` and `RuntimeHelpers.GetSubArray` will be used, respectively. | ||
|
||
#### Lowering on enumerable type | ||
|
||
> **Open question**: Confirm that async enumerables are out-of-scope. | ||
> **Open question**: Confirm that slice patterns with a sub_pattern (such as `..var x`) are out-of-scope. | ||
|
||
Although a helper type is not necessary, it helps simplify and illustrate the logic. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I gather the following is not the rigorous codegen spec and we may make changes as we go through the impl? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right. This rough codegen was mostly to convince myself that it would fit in the DAG binding design (evaluation steps, test steps, ...). Overall I expect it will fit okay. The parts I'm not sure yet:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
We either use an evaluation followed by a standard test, or a dedicated test node. As long as these won't affect subsumption checking, we can do away with simple eval nodes. In the prototype, I tried the first approach initially, but with trailing patterns, had to introduce a test node to disallow MoveNext after a slice which can happen in multi-arm matches. (This is the key to not requiring a buffer for starting elements, see the example above)
If the enumerator has a length (e.g. via TryGetNonEnumeratedCount), we check that first for an early failure, otherwise we only count if there's a length pattern. This happens right after we enumerate the sequence to the end. While we're doing it, we don't want to exceed the maxLength (if specified) and after that, we want to ensure we've reached the minLength (either inferred or specified). This covers the "early failure" that was discussed in the last LDM. Note that a simple
That is correct. Starting patterns won't need any kind of buffer since we have a temp per each.
I think if we use a standard buffer type we can indeed propose to it BCL. Currently I used a generic fixed-size stack with no additional logic.
Since we can't "join" leaf nodes in a DAG, we just wrap the "rest" of the lowering in a try/finally, starting at each GetEnumerator. As a consequence enumerators might stack up. For instance, in |
||
|
||
``` | ||
class ListPatternHelper | ||
{ | ||
// Notes: | ||
// We could inline this logic to avoid creating a new type and to handle the pattern-based enumeration scenarios. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this is a pretty serious issue. So far, I tried to not depend on the buffer type for iteration exactly because of it. |
||
// We may only need one element in start buffer, or maybe none at all, if we can control the order of checks in the patterns DAG. | ||
// We could emit a count check for a non-terminal `..` and economize on count checks a bit. | ||
private EnumeratorType enumerator; | ||
private int count; | ||
private ElementType[] startBuffer; | ||
private ElementType[] endCircularBuffer; | ||
|
||
public ListPatternHelper(EnumerableType enumerable, int startPatternsCount, int endPatternsCount) | ||
{ | ||
count = 0; | ||
enumerator = enumerable.GetEnumerator(); | ||
startBuffer = startPatternsCount == 0 ? null : new ElementType[startPatternsCount]; | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We might want to consider renting arrays for this. #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And having an There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Right. Since we'll need to track disposal of enumerator we have some options here. In reply to: 601740384 [](ancestors = 601740384) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we'll want to use stackalloc (unconditionally, anyway). This might be a place where we'd like a runtime helper like dotnet/runtime#25423. |
||
endCircularBuffer = endPatternsCount == 0 ? null : new ElementType[endPatternsCount]; | ||
} | ||
|
||
// targetIndex = -1 means we want to enumerate completely | ||
private int MoveNextIfNeeded(int targetIndex) | ||
{ | ||
int startSize = startBuffer?.Length ?? 0; | ||
int endSize = endCircularBuffer?.Length ?? 0; | ||
Debug.Assert(targetIndex == -1 || (targetIndex >= 0 && targetIndex < startSize)); | ||
|
||
while ((targetIndex == -1 || count <= targetIndex) && enumerator.MoveNext()) | ||
{ | ||
if (count < startSize) | ||
startBuffer[count] = enumerator.Current; | ||
|
||
if (endSize > 0) | ||
endCircularBuffer[count % endSize] = enumerator.Current; | ||
|
||
count++; | ||
} | ||
|
||
return count; | ||
} | ||
|
||
public int Count() | ||
{ | ||
return MoveNextIfNeeded(-1); | ||
} | ||
|
||
// fulfills the role of `[index]` for start elements when enough elements are available | ||
public bool TryGetStartElement(int index, out ElementType value) | ||
{ | ||
Debug.Assert(startBuffer is not null && index >= 0 && index < startBuffer.Length); | ||
MoveNextIfNeeded(index); | ||
if (count > index) | ||
{ | ||
value = startBuffer[index]; | ||
return true; | ||
} | ||
value = default; | ||
return false; | ||
} | ||
|
||
// fulfills the role of `[^hatIndex]` for end elements when enough elements are available | ||
public ElementType GetEndElement(int hatIndex) | ||
{ | ||
Debug.Assert(endCircularBuffer is not null && hatIndex > 0 && hatIndex <= endCircularBuffer.Length); | ||
int endSize = endCircularBuffer.Length; | ||
Debug.Assert(endSize > 0); | ||
return endCircularBuffer[(count - hatIndex) % endSize]; | ||
} | ||
} | ||
``` | ||
|
||
`collection is [3]` is lowered to | ||
``` | ||
@{ | ||
var helper = new ListPatternHelper(collection, 0, 0); | ||
|
||
helper.Count() == 3 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This does not feel good. We probably want to take a parameter of some kind an avoid enumerating the whole enumerable if we pass an upper limit, 3 in this case. #Resolved There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In the current design of patterns in general, we get a value then we check the value. I've added an open question on length-pattern cutting enumerations short (checking non-final value in some way), following Ali's suggestion. #Resolved |
||
} | ||
``` | ||
|
||
`collection is { 0, 1 }` is lowered to | ||
``` | ||
@{ | ||
var helper = new ListPatternHelper(collection, 2, 0); | ||
|
||
helper.TryGetStartElement(index: 0, out var element0) && element0 is 0 && | ||
helper.TryGetStartElement(1, out var element1) && element1 is 1 && | ||
helper.Count() == 2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Same comment about enumerable length. #Pending There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks. We only need to check we're at the end of the enumeration. |
||
} | ||
``` | ||
|
||
`collection is { 0, 1, .. }` is lowered to | ||
``` | ||
@{ | ||
var helper = new ListPatternHelper(collection, 2, 0); | ||
|
||
helper.TryGetStartElement(index: 0, out var element0) && element0 is 0 && | ||
helper.TryGetStartElement(1, out var element1) && element1 is 1 | ||
} | ||
``` | ||
|
||
`collection is { .., 3, 4 }` is lowered to | ||
``` | ||
@{ | ||
var helper = new ListPatternHelper(collection, 0, 2); | ||
|
||
helper.Count() > 2 && // `..` with 2 ending patterns | ||
jcouv marked this conversation as resolved.
Show resolved
Hide resolved
|
||
helper.GetEndElement(hatIndex: 2) is 3 && // [^2] is 3 | ||
helper.GetEndElement(1) is 4 // [^1] is 4 | ||
} | ||
``` | ||
|
||
`collection is { 1, 2, .., 3, 4 }` is lowered to | ||
``` | ||
@{ | ||
var helper = new ListPatternHelper(collection, 2, 2); | ||
|
||
helper.TryGetStartElement(index: 0, out var element0) && element0 is 1 && | ||
helper.TryGetStartElement(1, out var element1) && element1 is 2 && | ||
helper.Count() > 4 && // `..` with 2 starting patterns and 2 ending patterns | ||
jcouv marked this conversation as resolved.
Show resolved
Hide resolved
|
||
helper.TryGetEndElement(hatIndex: 2) is 3 && | ||
helper.TryGetEndElement(1) is 4 | ||
} | ||
``` | ||
|
||
The same way that a `Type { name: pattern }` *property_pattern* checks that the input has the expected type and isn't null before using that as receiver for the property checks, so can we have the `{ ..., ... }` *list_pattern* initialize a helper and use that as the pseudo-receiver for element accesses. | ||
This should allow merging branches of the patterns DAG, thus avoiding creating multiple enumerators. | ||
|
||
### Additional types | ||
|
||
Beyond the pattern-based mechanism outlined above, there are an additional two set of types that can be covered as a special case. | ||
|
This comment was marked as resolved.
Sorry, something went wrong.
This comment was marked as resolved.
Sorry, something went wrong.