-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add rewrite rules for text conversion #213
Conversation
01a9d45
to
011c842
Compare
I tried to figure out why Your rule seems equivalent to
After they get inlined, the rule seems to be
If we also inline function composition, we get
So we are left with
Now, what's this
Aha, so it's mapping some codepoints to
This logic is lost with your rewrite rule. Not a huge loss, but it does mean that your rewrite rule isn't meaning preserving. I bet you could write this rewrite rule instead:
|
011c842
to
345c6d8
Compare
Magnificent! I tried to delve into sources, but stopped at not finding rules for The nice news is that with your rule I get the same results as with my incorrect approach, at least when applied to string literals. Pushed the fixed version. |
My rule does not preserve semantics either, but it should play nicer with other rewrite rules from |
src/Universum/String/Conversion.hs
Outdated
@@ -159,6 +162,87 @@ instance ToString T.Text where | |||
instance ToString LT.Text where | |||
toString = LT.unpack | |||
|
|||
{- [Note toString-toText-rewritting] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hell yeah GHC-style notes.
29e2acd
to
919db77
Compare
``` | ||
|
||
This logic is lost with the mentioned rewrite rule. | ||
Not a huge loss, but it does mean that this rewrite rule isn't meaning preserving. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to fully understand consequences of this loss and be more explicit about it. AFAIU, it means that if we have String
which contains UTF-16 surrogate code point and apply toString . toText
to it, we'll get different results with and without this rule. Since we ignore safe
, does it mean that error
will be called somewhere?
I propose to:
- Add a test where we generate a
String
with such character and demonstrate that behavior is different. I don't know what exactly that test should do and which behavior is expected, but without such test it's hard to understand in which case and how this rule changes the semantics. - Mention it in
Gotchas
in README.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
It's not clear how can I illustrate this behavior. As comment to
safe
function suggests,Text
cannot contain surrogate code points, so thissafe
will not change anything if myString
is constructed fromText
. Unless my text is constructed incorrectly, the rule does not change the semantics. -
Added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- But what if you apply
toString . toText
to[c]
whereord c .&. 0x1ff800 == 0xd800
? AFAIU, with this rule and without this rule the result will be different, am I wrong?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, ya tuplu.
https://i.imgur.com/D1ZLGiG.png Btw,
I don't see these functions, did you delete them? |
Sad story. |
Yes, forgot to update commit/issue description. |
I'm trying to add a test on a character which is usually replaced by that My guess is that, since @int-index do you have any thoughts on this? |
3e06688
to
6608357
Compare
I think my rule has a chance to fire after |
Kaef, it works. Although I have a feeling that our tests are pretty unstable. 🙄 |
@Martoon-00 btw, did you consider |
Okay, why not add it. |
Ok, cool. Just in case: did you run benchmarks and verify that |
I assigned it |
Yes, I got something like 0.2 ms with the rule against ~4 ms without it. |
1e1a164
to
955a799
Compare
It seems we should do #214 first. |
Probably #216 should let us merge this PR. |
Rebase on |
955a799
to
b0e3b26
Compare
|
Added rewrite rule to eliminate toString . toText conversions. Closes #212. Benchmarks without rewrite rule: ``` benchmarked toText/toString time 1.013 ms (875.1 μs .. 1.148 ms) 0.819 R² (0.691 R² .. 0.904 R²) mean 1.621 ms (1.268 ms .. 2.619 ms) std dev 2.134 ms (331.2 μs .. 4.071 ms) variance introduced by outliers: 97% (severely inflated) ``` And with rewrite rule: ``` benchmarked toText/toString time 185.6 μs (182.7 μs .. 188.2 μs) 0.998 R² (0.996 R² .. 0.999 R²) mean 187.0 μs (185.2 μs .. 189.4 μs) std dev 7.211 μs (5.623 μs .. 10.07 μs) variance introduced by outliers: 20% (moderately inflated) ```
Also add just a plain `T.unpack . T.pack = id` rewrite rule.
b0e3b26
to
4ac5a16
Compare
🎉 |
Something happened that I completely cannot explain at the moment. @int-index's observation shows that And now when I open And like, That all feels really weird. The only explanation I see is that all sufficiently old |
@Martoon-00 You seem to be looking at |
Meeh, right 🤦. Thanks for saving me from this madness. |
Added rewrite rule to eliminate
toString . toText
conversions.Benchmarks without rewrite rule:
And with rewrite rule:
Description
Related issues(s)
Fixed #212.
✓ Checklist for your Pull Request
Ideally a PR has all of the checkmarks set.
If something in this list is irrelevant to your PR, you should still set this
checkmark indicating that you are sure it is dealt with (be that by irrelevance).
are inextricably linked. Otherwise I should open multiple PR's.
reference this issue. See also auto linking on
github.
Related changes (conditional)
Tests
silently reappearing again.
Documentation
I checked whether I should update the docs and did so if necessary:
Record your changes
and
Stylistic guide (mandatory)
My commit history is clean (only contains changes relating to my
issue/pull request and no reverted-my-earlier-commit changes) and commit
messages start with identifiers of related issues in square brackets.
Example:
[#42] Short commit description
If necessary both of these can be achieved even after the commits have been
made/pushed using rebase and squash.