-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regarding EscapeRegExpPattern.md #26
Comments
cc @EladRK |
Thanks @allenwb , I'll keep this short since you're a busy guy: Correct me if I'm wrong, EscapeRegExpPattern (as the name implies) takes a pattern and escapes it so that it can be represented as a string. What While the two are related operations - I don't see how I did not write |
@benjamingr So, let's drill a bit more into the possible use cases for
These are both valid use cases and a programmer with either problem might reasonably expect that If To me, this makes it clear that the more broadly useful and less error prone semantics is to always escape |
Agreed.
Definitely, I think EscapeRegExpString was named in a way that was fine when it was named but is confusing in light of
Agreed.
This is an interesting case and I've spent some time looking into it in the past few days. I have scraped hundreds of thousands of code bases (10k npm most popular and 10k most dependent on packages, GH search result for various matches, top websites in alexa ranking, code and symbol search. Virtually no one is doing I've consulted a computer engineering professor about it and they seemed content with the methodology.
These are use cases programmers don't actually do in RegExp. Other languages (even those with regexp delimiters like PHP, that don't need eval to need them) do not escape So practically, unless I'm missing something in the data, the second use case does not actually exist in the wild. If you have data or a code base that indicates otherwise I'm all up for escaping
It makes the resulting string less readable and longer. Assuming the longer thing (more memory) isn't actually a problem - the readability is something that got brought up at least twice before in programming languages. Perl switched its syntax because of it at one point, Python first changed their processing of To be fair, I'm very amendable to adding |
And, if necessary we can rename EscapeRegExpString
In other words you have proven that some people indeed do exactly what I've described. Concatenate
New features need to thought about from the perspective of the the future as well as the past. People do write source-to-source translators in JS and are likely to increasingly do so. I mentioned dynamic module generation. Another possibility that seems increasingly likely is such translations taking place within template string tag handlers.
Presumably most strings that are going to be escaped are intented to be mechanically processed. What are use cases where readability is of primary importance. In particular, the escaping under discussion would only occur if I think we can agree that the longer string is not a problem; so mentioning it doesn't contribute to discussion.
So, Python's 3.0 re.escape() does escape
I don't think you need data. Not all design issues can be settled using data. It seems pretty clear that Personally, I was unsure about whether this additional escaping was necessary or desirable when I raised the question. The discussion has convinced me that it is, at least, desirable. |
What about GetPatternStringRepresentation or GetEscapedStringForPattern or something like that?
I don't understand, there are less than 10 usage examples of the pattern, and all of them are wrong in some other way too like doing Just to make it clear, by not supporting it we're not breaking anyone's code, we're simply not supporting a use case that doesn't actually exist in code bases on the web. I'm not sure why we would want to support passing it to eval any more than we want to support passing Again, if you can show people actually do this - this changes the picture completely.
I have not found a source-to-source translator that does this (and I looked), if you find a counter example I'm all ears.
Well, in Python people complained about readability, but even if they are mechanically processed they are still manually debugged. An escaped like
Python's old re.escape escapes pretty much everything. People have complained about the escape set being too large and they made explicit opt outs (
Not all design issues can be settled using data but in this case the data tells us:
I don't think it's as clear. I still need convincing of an actual use case to justify the readability impact. Debugging generated regular expressions is something I've had to do several times and I did not enjoy it one bit so anything that helps with that is a win IMO. I'm also considering allowing escaping more things - see #27 |
If we want to drop the readability guarantees I'd also like to:
This would make the output less readable but safe in a context sensitive environment. As a side note, no one is using the escape polyfill (or |
Superseding this with #29 |
In ES6 EscapeRegExpPattern is defined in 21.2.3.2.4. In ES5.1 it wasn't a named abstract operation but its semantics for escaping and setting the
source
property was specified in 15.10.4.1.In both ES5.1 15.10.6.4 and Es6 21.2.5.14
RegExp.prototype.toString
is specified to use the value of thesource
property. Both the ES5.1 and ES6 specs for include a note stating that the value returned should be in the form of a RegularExpressionLiteral that would evaluate to aRegExp
object that would have the same matching behavior as the original object.The text was updated successfully, but these errors were encountered: