Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

String should implement IReadOnlyList<char> #19609

Closed
danmoseley opened this issue Dec 8, 2016 · 25 comments
Closed

String should implement IReadOnlyList<char> #19609

danmoseley opened this issue Dec 8, 2016 · 25 comments
Labels
area-System.Runtime help wanted [up-for-grabs] Good issue for external contributors
Milestone

Comments

@danmoseley
Copy link
Member

public sealed partial class String : System.Collections.Generic.IEnumerable<char>, System.Collections.IEnumerable, System.IComparable, System.IComparable<string>, System.IConvertible, System.IEquatable<string>, System.ICloneable, System.Collections.Generic.IReadOnlyList<char>
{
   //...
   //IReadOnlyList<char> members:
   int Count { get; }
   char this[int index] { get; }

   public IEnumerator<char> GetEnumerator() { }
  // ...
}


The original proposal was to use IReadOnlyList instead of IEnumerable, but that is not possible since it would break compatibility, so the proposal is instead to just add the new interface implementation.

@stas-sultanov commented on Mon Mar 23 2015

I would like to suggest to make class String to implement IReadOnlyList{char} instead of IEnumerable{char}.

Possibly interface should be implemented explicitly because of the Length property, which is extern and hardcoded in JIT. On the other hand Length could be deprecated and Count should be used instead.


@pharring commented on Thu Mar 26 2015

@stas-sultanov Thanks for the suggestion. Can you give an example of where this would be useful?


@stas-sultanov commented on Fri Mar 27 2015

Sure. Here is an example from one of my projects:
A ConvertEx class which provides a set of methods to convert data from base-n string to byte array and vice-versa. This class has a lot of methods like:

public static TryResult<String> TryToBase16String(byte[] data, int offset, int length)
public static TryResult<Byte[]> TryFromBase16String(string data, int offset, int length)

for this methods i'd like to use single method that checks arguments

    private static void CheckArguments<T>(IReadOnlyCollection<T> data, int offset, int length)
    {
        if (data == null)
            throw new ArgumentNullException(@"data");
        if (length < 0)
            throw new ArgumentOutOfRangeException(@"length", length, @"Is less than 0.");
        if (offset < 0)
            throw new ArgumentOutOfRangeException(@"offset", offset, @"Is less than 0.");
        if (offset > (data.Count - length))
            throw new ArgumentOutOfRangeException(@"data", data.Count, @"Offset plus length is greater than the length of data.");
    }

But unfortunately it is not possible because byte[] and string has only one common interface IEnumerable<T>. Use IEnumerable<T> and get length via Count() method is not an option due to performance consideration.


@pharring commented on Fri Mar 27 2015

A non-generic overload of CheckArguments would solve this, of course.


@stas-sultanov commented on Fri Mar 27 2015

of course it would, and that it is the way it is made now. :) but it dose not looks like a good design.

maybe example is not good enough, however i strongly believe that this small change would made coreclr more versatile.


@SamuelEnglard commented on Sun Mar 29 2015

Just a quick point but unless I'm wrong array's don't implement IReadOnlyList<T> so you'd still have to use the Enumerable.Count(). Do access the length/count of both an array and string in an efficient manner we'd have to Enumerable.Count() smarter. It already looks for types that implement ICollection or ICollection<T> and uses the Count property instead of looping. If we add a shortcut for IReadOnlyList<T> to that too then you could do what you want.


@stas-sultanov commented on Mon Mar 30 2015

Arrays do implement IReadOnlyList<T>, please take a look at comments of Array class.
Or try to compile followin code:

IReadOnlyList<int> test = new [] {1, 2, 3};

the Enumerable class that implements Count method is not part of the coreclr.


@JohanLarsson commented on Sun Mar 29 2015

Not much perf diff between .Count() and .Length for an array
IReadOnlyList is a nice interface though.


@SamuelEnglard commented on Sun Mar 29 2015

Ah yes, I had looked before posting but missed that comment. My bad. So then yes I'm all for making string implement IReadOnlyList<T>. Sorry about that


@stas-sultanov commented on Mon Mar 30 2015

Not much perf diff between .Count() and .Length for an array

indeed, but once again:

the Enumerable class that implements Count method is not part of the coreclr.


@VladimirGnenniy commented on Mon Mar 30 2015

I believe that this would be nice improvement of the architecture of the .net classes.


@terrajobst commented on Tue Apr 07 2015

@KrzysztofCwalina How do you think about this? It seems that String would eventually support Span<char>, would this additional interface make sense in general?


@Lukazoid commented on Fri Jun 19 2015

+1 for this, would be really useful to be able to pass strings directly as IReadOnlyList/IReadOnlyCollection parameters instead of going via .ToCharArray() or using a wrapper class.


@KrzysztofCwalina commented on Fri Jun 19 2015

To me string is a primitive type. I think it's already a mistake that it implements IEnumerable.


@sharwell commented on Fri Jun 19 2015

@KrzysztofCwalina I totally get that view. It leads to special cases in certain code that isn't required in, say, Java. But for better or worse, that ship has sailed. Given that string already implements IEnumerable<char> (and that isn't going to change), the question is whether or not it should also implement IReadOnlyList<char>.

💭 I think it makes sense specifically for some cases where code currently needs to call ToCharArray(). I couldn't find any clear-cut cases where the new implementation would simplify my own existing code, but especially given it already implements IEnumerable<char> I think it makes sense to implement IReadOnlyList<char> as well.


@KrzysztofCwalina commented on Fri Jun 19 2015

If we implemented it explicitly and made sure the Linq extensions don't show on String, then I guess it would be fine.


@sharwell commented on Fri Jun 19 2015

@KrzysztofCwalina It already implements IEnumerable<char>, so the LINQ extensions already appear.


@KrzysztofCwalina commented on Fri Jun 19 2015

Hmmm, I was under the impression that we special cased it in C#.


@KrzysztofCwalina commented on Fri Jun 19 2015

@jaredpar, @MadsTorgersen, I thought C# was special cased to not show Linq methods on String?


@sharwell commented on Fri Jun 19 2015

In the PCL, string only implements IEnumerable (not IEnumerable<char>), so only OfType and Cast appear. When I target .NET 4.5+ desktop, I see the LINQ methods appearing in VS 2015 RC.


@MadsTorgersen commented on Fri Jun 19 2015

@KrzysztofCwalina I think you just spent too much time with PCL. :-) In the full framework, LINQ always worked over string - for better or worse.


@KrzysztofCwalina commented on Fri Jun 19 2015

Yeah, it's all slowly coming to me. I think we initially wanted to special case it, but then we just decided to implement IEnumerable. Then in 8.1 we added IEnumerable (why?), and we are back to the original problem that string intellisense shows all these linq methods that probably should not be used on strings.


@danmosemsft commented on Thu Dec 08 2016

API request -- moving to CoreFX

@joperezr
Copy link
Member

joperezr commented Dec 9, 2016

@KrzysztofCwalina @terrajobst This is kind of an old issue that was ported from coreclr, but do we still want to consider implementing IReadOnlyList<char> on String? I'm marking this as ready for review so that we take a further look.

@joperezr
Copy link
Member

joperezr commented Dec 9, 2016

cc: @stas-sultanov (logged the original issue in coreclr)

@karelz
Copy link
Member

karelz commented Dec 13, 2016

@joperezr is there any downside to implementing IReadOnlyList<char> on string?
The issue title says "instead of IEnumerable<char>" - that would be a breaking change we can't take right?

@shmuelie
Copy link
Contributor

@karelz IReadOnlyList<T> implements IEnumerable<T> so not a breaking change, no?

@JonHanna
Copy link
Contributor

The issue title says "instead of IEnumerable" - that would be a breaking change we can't take right?

It would be an impossible change too, since IReadOnlyList<char> derives from IEnumerable<char>, though maybe they were thinking about how at the code level you don't need IEnumerable<char> to appear in the source any more once you've added IReadOnlyList<char>.

@karelz karelz changed the title String to implement IReadOnlyList<char> instead of IEnumerable<char> String should implement IReadOnlyList<char> Dec 13, 2016
@karelz
Copy link
Member

karelz commented Dec 13, 2016

Updating the title.
@joperezr can you please add the formal API review (API shape, motivation, usage). It will be easier to review.

@pharring
Copy link
Contributor

Since this is up for review, I'd also like to suggest adding a marker interface to System.String called System.IString that may be used to identify "string-like" things. To tie in with this proposal, I believe System.IString should derive from System.Collections.Generic.IReadOnlyList<char> and System.String would implement System.IString rather than IReadOnlyList<char> directly.

@joperezr, @KrzysztofCwalina What do you think?

If we had this when we built Roslyn, we would have used IString pretty much everywhere where we currently use string, both internally and in the public APIs. It would have allowed us to eliminate many string allocations that were forced upon us because we had to eagerly evaluate substring, concat and encoding conversions. I imagine this would be the case for any text processing library. Indeed, once you have this marker interface, you'll probably want to provide new overloads on many existing FX classes (e.g. serialization, encoding, formatting).

@karelz
Copy link
Member

karelz commented Dec 16, 2016

@pharring is there benefit to tie it to this proposal? Shouldn't it be separate proposal?

@karelz
Copy link
Member

karelz commented Dec 16, 2016

@joperezr can you please update the API proposal?

@pharring
Copy link
Contributor

@karelz Yes, they could be separate proposals. I was being a bit lazy by piggy-backing on this one.

Really, I want System.IString to act like an easier-to-type-and-remember alias for System.Collections.Generic.IReadOnlyCollection<char>. If System.IString was introduced on its own, then it would need a few methods from IReadOnlyCollection<char>.

I'll open a new issue to get some feedback before making a formal proposal.

@rvhuang
Copy link

rvhuang commented Dec 23, 2016

I support this proposal (to implement IReadOnlyList(T)).

Also it is also possible to simply implement ICollection(T) and mark property IsReadOnly as true because it could bring benefit when users use Enumerable(T) extensions such as Count() on string.

@joperezr
Copy link
Member

joperezr commented Jan 4, 2017

Here is the formal proposal from this thread:

public sealed partial class String : System.Collections.Generic.IEnumerable<char>, System.Collections.IEnumerable, System.IComparable, System.IComparable<string>, System.IConvertible, System.IEquatable<string>, System.ICloneable, System.Collections.Generic.IReadOnlyList<char>
{
   //...
   //IReadOnlyList<char> members:
   int Count { get; }
   char this[int index] { get; }

   public IEnumerator<char> GetEnumerator() { }
  // ...
}

The original proposal was to use IReadOnlyList instead of IEnumerable, but that is not possible since it would break compatibility, so the proposal is instead to just add the new interface implementation.

@joperezr joperezr removed their assignment Jan 4, 2017
@joperezr
Copy link
Member

joperezr commented Jan 4, 2017

The above proposal was just derived from the discussion of this thread, so if folks are happy with that we can go ahead and mark this as ready-for-review.

@pharring
Copy link
Contributor

pharring commented Jan 4, 2017

@joperezr Don't you also need to implement int Count { get; } ?

@joperezr
Copy link
Member

joperezr commented Jan 4, 2017

@pharring you are right, I updated the proposal above.

@jnm2
Copy link
Contributor

jnm2 commented Jan 5, 2017

The original proposal was to use IReadOnlyList instead of IEnumerable, but that is not possible since it would break compatibility, so the proposal is instead to just add the new interface implementation.

Like @JonHanna said: It's not impossible due to compat reasons, it's impossible because IReadOnlyList<char> includes IEnumerable<char>. Specifying IEnumerable<char> in the list is redundant.

@AmadeusW
Copy link

I'm following this issue and just wanted to share a blog post on Ropes (wiki) that I just saw. Ropes are a data structure that might be used to implement strings in disjoint memory locations, which we could use in dotnet/corefx#16786

@jamesqo
Copy link
Contributor

jamesqo commented Jul 28, 2017

What is the status of this proposal? @karelz, it looks like @joperezr has added a formal proposal here, maybe this is ready-for-review?

@karelz
Copy link
Member

karelz commented Jul 28, 2017

If @joperezr thinks it is ready, please update top post with final proposal and mark it api-ready-for-review.

@danmoseley
Copy link
Member Author

@joperezr ?

@terrajobst
Copy link
Member

terrajobst commented Dec 5, 2017

Video

String isn't (logically) a collection of chars, so giving a read-only view seems off. Considering how far we're with the new span based APIs, I'm not convinced this interface is adding a ton of value for code that cares about performance (and code that doesn't has alternatives today).

@karelz
Copy link
Member

karelz commented Dec 5, 2017

@pharring are you ok with this resolution? Is Span & friends sufficient "workaround" for your scenarios?

@karelz
Copy link
Member

karelz commented Dec 5, 2017

FYI: The API review discussion was recorded - see https://youtu.be/BI3iXFT8H7E?t=4973 (15 min duration)

@pharring
Copy link
Contributor

pharring commented Dec 6, 2017

@terrajobst If it's not a collection of chars, then how would you describe it? (I used your words. My own description in dotnet/corefx#14572 is more precise.)

@karelz I've been playing with ReadOnlySpan<char> recently and, while it has its own limitations (requires latest C# compiler support; different behavior in .NET Core versus Desktop framework) it certainly solves the 'substring' problem.

However, it ReadOnlySpan<char> doesn't help avoid eager construction of concat or decoding.

In any case, dotnet/corefx#14572 is still open and we can move the discussion there.

@pharring
Copy link
Contributor

pharring commented Dec 6, 2017

@terrajobst Just watched the video. In it you say that Roslyn wanted this to work over TextBuffer and we solved that with SourceText. That's roughly correct, but the use cases for an abstraction over string are a lot broader than that. Consider all the APIs that construct readable names over syntax nodes or symbols. To construct a fully-qualified name, you have to pull together fragments of strings from various sources: metadata - which is probably encoded as UTF8 - other symbols (namespaces, types) - and source (SourceText - also may be encoded). Then you have to apply the rules for forming type names which are concatenations of these pieces with separator chars (VB and C# are different). And often you do all that so that the caller can compare the value with some other string (where knowing that the lengths are different is enough to short-circuit the comparison) and then throw the value away. It's tons of gen-0 garbage. ReadOnlySpan<char> isn't appropriate for those cases.

@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.1.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 27, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Runtime help wanted [up-for-grabs] Good issue for external contributors
Projects
None yet
Development

No branches or pull requests