Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make most emojis full-width #5795

Merged
9 commits merged into from
May 8, 2020
Merged

Make most emojis full-width #5795

9 commits merged into from
May 8, 2020

Conversation

leonMSFT
Copy link
Contributor

@leonMSFT leonMSFT commented May 7, 2020

The table that we refer to in CodepointWidthDetector.cpp to determine
whether or not a codepoint should be rendered as Wide vs Narrow was
based off EastAsianWidth[1]. If a codepoint wasn't included in this
table, they're considered Narrow. Many emojis aren't specified in the
EAW list, so this PR supplements our table with emoji codepoints from
emoji-data[2] in order to render most, if not all, emojis as full-width.

There are certain codepoints I've added to the comments (in case we want
to add them officially to the table in the future) that Microsoft
decided to give an emoji presentation even if it's specified as
Narrow/Ambiguous in the EAW list and are not specified in the Unicode
emoji list. These include all of the Mahjong Tiles block, different
direction pencils (✎✐), different pointing index fingers (☜, ☞) among
others. I have no idea if I've captured all of them, as I don't know of
an easy way to detect which are Microsoft specific emojis.

Validation Steps Performed

I have looked at so many emojis that I dream emoji.

These screenshots aren't encompassing all emoji but I've tried to grab
a couple from all across the codepoint ranges:

Before:
before

After:
after

[1] http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
[2] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt

Closes #900

@ghost ghost added Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-2 A description (P2) Product-Terminal The new Windows Terminal. labels May 7, 2020
Copy link
Member

@zadjii-msft zadjii-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I'm fine with this, but let's get the expert in on this.

Also: Add some screenshots 😄

@zadjii-msft
Copy link
Member

@msftbot make sure @miniksa signs off on this

@ghost ghost added the AutoMerge Marked for automatic merge by the bot when requirements are met label May 7, 2020
@ghost
Copy link

ghost commented May 7, 2020

Hello @zadjii-msft!

Because you've given me some instructions on how to help merge this pull request, I'll be modifying my merge approach. Here's how I understand your requirements for merging this pull request:

  • I'll only merge this pull request if it's approved by @miniksa

If this doesn't seem right to you, you can tell me to cancel these instructions and use the auto-merge policy that has been configured for this repository. Try telling me "forget everything I just told you".

UnicodeRange{ 0x23f0, 0x23f0, CodepointWidth::Wide },
UnicodeRange{ 0x23f3, 0x23f3, CodepointWidth::Wide },
UnicodeRange{ 0x2460, 0x24e9, CodepointWidth::Ambiguous },
UnicodeRange{ 0x2328, 0x232a, CodepointWidth::Wide },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible -- and I know this is a lot of work -- i'd like a comment on each line that refers to a footnote about us overriding these. Just so we can come find it later.

Copy link
Contributor

@DHowett-MSFT DHowett-MSFT left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Changes requested in comment)

@ghost ghost added the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label May 8, 2020
@j4james
Copy link
Collaborator

j4james commented May 8, 2020

I'm not an expert on this, but I've followed a few other discussions on this subject, and my understanding was that it was vital that we exactly follow whatever standard everyone else uses for determining character width. Applications like editors need to be able to predicate exactly how much of a line of text they can fit on the screen before it wraps. They can't do that if the standard says a character should be narrow, but we're rendering it as wide.

I don't know exactly what the standards are that we should be following, so maybe everything in this PR is good, but the mention of "Microsoft specific emojis" makes it sound like we're making up own rules for some characters, and I suspect that's going to cause problems.

@miniksa
Copy link
Member

miniksa commented May 8, 2020

I'm not an expert on this, but I've followed a few other discussions on this subject, and my understanding was that it was vital that we exactly follow whatever standard everyone else uses for determining character width. Applications like editors need to be able to predicate exactly how much of a line of text they can fit on the screen before it wraps. They can't do that if the standard says a character should be narrow, but we're rendering it as wide.

I don't know exactly what the standards are that we should be following, so maybe everything in this PR is good, but the mention of "Microsoft specific emojis" makes it sound like we're making up own rules for some characters, and I suspect that's going to cause problems.

I agree with James here. If Microsoft are the only one emojifying a particular Unicode codepoint, we should just center that glyph inside the 1 cell that it's supposed to be taking up.

EDIT: Or we should put just those ones behind a setting called experimental.wideMicrosoftEmoji = true

@leonMSFT
Copy link
Contributor Author

leonMSFT commented May 8, 2020

That's fair, I was unsure about putting the Microsoft specific emojis in, because if I didn't, the lower_right_pencil u270e emoji that was noticed to be tiny in #900 would stay tiny since it isn't in the standard.

@ghost ghost removed the Needs-Author-Feedback The original author of the issue/PR needs to come back and respond to something label May 8, 2020
@miniksa
Copy link
Member

miniksa commented May 8, 2020

That's fair, I was unsure about putting the Microsoft specific emojis in, because if I didn't, the lower_right_pencil u270e emoji that was noticed to be tiny in #900 would stay tiny since it isn't in the standard.

Yeah that's why I said maybe we make it an experimental toggle that is off by default. So purists can have what Unicode dictates and Microsoft fanboys can have their full width pencils.

@DHowett-MSFT
Copy link
Contributor

Yeah that's why I said maybe we make it an experimental toggle that is off by default. So purists can have what Unicode dictates and Microsoft fanboys can have their full width pencils.

I am deeply concerned with us plugging a configuration option into this global singleton function (since right now we don't have a good place to store state fo codepoint width detection, and we need conpty to agree on the widths with terminal, and there's no good way to communicate that state down to conpty)

@leonMSFT
Copy link
Contributor Author

leonMSFT commented May 8, 2020

I'll leave the MS codepoints I've found in the comments so we can come back to implement them as we wish in the future then?

@miniksa
Copy link
Member

miniksa commented May 8, 2020

Yeah that's why I said maybe we make it an experimental toggle that is off by default. So purists can have what Unicode dictates and Microsoft fanboys can have their full width pencils.

I am deeply concerned with us plugging a configuration option into this global singleton function (since right now we don't have a good place to store state fo codepoint width detection, and we need conpty to agree on the widths with terminal, and there's no good way to communicate that state down to conpty)

Oof. OK. Yeah you're right.

I'll leave the MS codepoints I've found in the comments so we can come back to implement them as we wish in the future then?

Yeah that's probably best.

Copy link
Member

@miniksa miniksa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉🎉

@DHowett-MSFT
Copy link
Contributor

James is right, but right now the best thing we have is "vague consensus of various wcwidth implementations" and we're only making things slightly more inconsistent by adding another one. I'm not happy that we have to do this, but I think that applications are going to have a rough time regardless since the landscape here is so bad. wcwidth operates on a wchar_t, and there's no practical way to handle things like the emoji variation selector or other composing characters.

Unfortunately, emoji are just one place where column-counting breaks down for applications. ☹️

@ghost ghost merged commit 7ae3433 into master May 8, 2020
@ghost ghost deleted the dev/lelian/emojibig branch May 8, 2020 22:31
DHowett-MSFT pushed a commit that referenced this pull request May 8, 2020
The table that we refer to in `CodepointWidthDetector.cpp` to determine
whether or not a codepoint should be rendered as Wide vs Narrow was
based off EastAsianWidth[1].  If a codepoint wasn't included in this
table, they're considered Narrow. Many emojis aren't specified in the
EAW list, so this PR supplements our table with emoji codepoints from
emoji-data[2] in order to render most, if not all, emojis as full-width.

There are certain codepoints I've added to the comments (in case we want
to add them officially to the table in the future) that Microsoft
decided to give an emoji presentation even if it's specified as
Narrow/Ambiguous in the EAW list and are _not_ specified in the Unicode
emoji list. These include all of the Mahjong Tiles block, different
direction pencils (✎✐), different pointing index fingers (☜, ☞) among
others. I have no idea if I've captured all of them, as I don't know of
an easy way to detect which are Microsoft specific emojis.

## Validation Steps Performed
I have looked at so many emojis that I dream emoji.

These screenshots aren't encompassing _all_ emoji but I've tried to grab
a couple from all across the codepoint ranges:

Before:
![before](https://user-images.githubusercontent.com/57155886/81445092-2051a980-912d-11ea-9739-c9f588da407d.png)

After:
![after](https://user-images.githubusercontent.com/57155886/81445107-2778b780-912d-11ea-9615-676c2150e798.png)

[1] http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
[2] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt

Closes #900

(cherry picked from commit 7ae3433)
ghost pushed a commit that referenced this pull request May 12, 2020
A couple of codepoints, namely the card suites, male and female signs,
and white and black smiling faces were changed to have a two-column
width as part of #5795 since they were specified as emoji in Unicode's
emoji list v13.0[1]. 

These particular glyphs also show up in some of the most fundamental
code pages, such as CP437[2] and WGL4[3]. We should
not be touching the width of the glyphs in these codepages, as suddenly
changing a long-time-running narrow glyph to use two-columns all of a
sudden will surely break (and has already broken) things.

[1] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt
[2] https://en.wikipedia.org/wiki/Code_page_437
[3] https://en.wikipedia.org/wiki/Windows_Glyph_List_4

Closes #5822
DHowett-MSFT pushed a commit that referenced this pull request May 12, 2020
A couple of codepoints, namely the card suites, male and female signs,
and white and black smiling faces were changed to have a two-column
width as part of #5795 since they were specified as emoji in Unicode's
emoji list v13.0[1].

These particular glyphs also show up in some of the most fundamental
code pages, such as CP437[2] and WGL4[3]. We should
not be touching the width of the glyphs in these codepages, as suddenly
changing a long-time-running narrow glyph to use two-columns all of a
sudden will surely break (and has already broken) things.

[1] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt
[2] https://en.wikipedia.org/wiki/Code_page_437
[3] https://en.wikipedia.org/wiki/Windows_Glyph_List_4

Closes #5822

(cherry picked from commit cf62922)
@ghost
Copy link

ghost commented May 13, 2020

🎉Windows Terminal Release Candidate v0.11.1333.0 (1.0rc2) has been released which incorporates this pull request.:tada:

Handy links:

@ghost ghost mentioned this pull request May 13, 2020
DHowett-MSFT pushed a commit that referenced this pull request May 16, 2020
@TapeWerm
Copy link

When I type a ☢ emoji and backspace it in Ubuntu 18.04 I need to Ctrl-L to make it disappear. This does not happen on cmd. Should I open a separate issue for this?

@DHowett-MSFT
Copy link
Contributor

Nope. We fixed this with #5934 😄 we got some of the widths wrong

jelster pushed a commit to jelster/terminal that referenced this pull request May 28, 2020
The table that we refer to in `CodepointWidthDetector.cpp` to determine
whether or not a codepoint should be rendered as Wide vs Narrow was
based off EastAsianWidth[1].  If a codepoint wasn't included in this
table, they're considered Narrow. Many emojis aren't specified in the
EAW list, so this PR supplements our table with emoji codepoints from
emoji-data[2] in order to render most, if not all, emojis as full-width. 

There are certain codepoints I've added to the comments (in case we want
to add them officially to the table in the future) that Microsoft
decided to give an emoji presentation even if it's specified as
Narrow/Ambiguous in the EAW list and are _not_ specified in the Unicode
emoji list. These include all of the Mahjong Tiles block, different
direction pencils (✎✐), different pointing index fingers (☜, ☞) among
others. I have no idea if I've captured all of them, as I don't know of
an easy way to detect which are Microsoft specific emojis.

## Validation Steps Performed
I have looked at so many emojis that I dream emoji.

These screenshots aren't encompassing _all_ emoji but I've tried to grab
a couple from all across the codepoint ranges:

Before:
![before](https://user-images.githubusercontent.com/57155886/81445092-2051a980-912d-11ea-9739-c9f588da407d.png)

After:
![after](https://user-images.githubusercontent.com/57155886/81445107-2778b780-912d-11ea-9615-676c2150e798.png)

[1] http://www.unicode.org/Public/UCD/latest/ucd/EastAsianWidth.txt
[2] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt

Closes microsoft#900
jelster pushed a commit to jelster/terminal that referenced this pull request May 28, 2020
A couple of codepoints, namely the card suites, male and female signs,
and white and black smiling faces were changed to have a two-column
width as part of microsoft#5795 since they were specified as emoji in Unicode's
emoji list v13.0[1]. 

These particular glyphs also show up in some of the most fundamental
code pages, such as CP437[2] and WGL4[3]. We should
not be touching the width of the glyphs in these codepages, as suddenly
changing a long-time-running narrow glyph to use two-columns all of a
sudden will surely break (and has already broken) things.

[1] https://www.unicode.org/Public/13.0.0/ucd/emoji/emoji-data.txt
[2] https://en.wikipedia.org/wiki/Code_page_437
[3] https://en.wikipedia.org/wiki/Windows_Glyph_List_4

Closes microsoft#5822
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Rendering Text rendering, emoji, complex glyph & font-fallback issues AutoMerge Marked for automatic merge by the bot when requirements are met Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-2 A description (P2) Product-Terminal The new Windows Terminal.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Certain "emoji" are still half-sized
6 participants