Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the hyperlink detection when there are leading wide glyph in the row #16775

Conversation

comzyh
Copy link
Contributor

@comzyh comzyh commented Feb 27, 2024

Summary of the Pull Request

URL detection was broken again in #15858. When the regex matched, we calculate the column(cell) by its offset, we use forward or backward iteration of the column to find the correct column that displays the glyphs of _chars[offset].

til::CoordType CharToColumnMapper::GetLeadingColumnAt(ptrdiff_t offset) noexcept
{
offset = clamp(offset, 0, _lastCharOffset);
auto col = _currentColumn;
const auto currentOffset = _charOffsets[col];
// Goal: Move the _currentColumn cursor to a cell which contains the given target offset.
// Depending on where the target offset is we have to either search forward or backward.
if (offset < currentOffset)

However, when calculating the currentOffset we forget that MSB of _charOffsets[col] could be 1, or col is pointing to another glyph in preceding column.

// for _charOffsets. Instead we use the most significant bit to indicate whether any column is the
// trailing half of a wide glyph. This simplifies many implementation details via _uncheckedIsTrailer.
static constexpr uint16_t CharOffsetsTrailer = 0x8000;
static constexpr uint16_t CharOffsetsMask = 0x7fff;

@comzyh
Copy link
Contributor Author

comzyh commented Feb 27, 2024

Before:
image

After:
image

@lhecker
also the test case for reproduce

安装最新的 PowerShell,了解新功能和改进!https://aka.ms/PSWindows

@comzyh comzyh changed the title Fix the direction of column iteration in CharToColumnMapper::GetLeadingColumnAt Fix the hyperlink detection when there are leading wide glyph in the row Feb 27, 2024
@lhecker
Copy link
Member

lhecker commented Feb 28, 2024

First of all, I deeply apologize for breaking this.

I'm astonished you figured out how to fix this! This is rather deep in our code base after all. If you ever have any questions, please feel free to ask us. 🙂 I've edited your PR a bit to remove the images, since the commit may be archived for a long time and GitHub may not be around anymore at that point.

@lhecker lhecker added Area-Output Related to output processing (inserting text into buffer, retrieving buffer text, etc.) Issue-Bug It either shouldn't be doing this or needs an investigation. Product-Terminal The new Windows Terminal. Priority-2 A description (P2) labels Feb 28, 2024
@comzyh
Copy link
Contributor Author

comzyh commented Feb 28, 2024

Thanks.
I just feel the following part still has some bugs. When looking backward, it seems we will return 1 more than we expected.

for (; offset < _charOffsets[col - 1]; --col)
{
}

you can try:

𝓐𝓐𝓐!https://aka.ms/PSWindows

image

@lhecker lhecker added the AutoMerge Marked for automatic merge by the bot when requirements are met label Feb 28, 2024
@microsoft-github-policy-service microsoft-github-policy-service bot merged commit e7796e7 into microsoft:main Feb 28, 2024
15 checks passed
@lhecker
Copy link
Member

lhecker commented Feb 28, 2024

I've merged your PR as I'm planning to add some explanatory comments on top of your changes. Thanks for letting me know about the other issue you've found. I'll try to fix that with my next PR. I'm hoping to get to that later today or tomorrow. 🙂

@lhecker
Copy link
Member

lhecker commented Feb 29, 2024

I've figured it out. It's really that I forgot to use the CharOffsetsMask but rather that this loop should not subtract 1 from col:

for (; offset < _charOffsets[col - 1]; --col)
{
}

In other words, this is correct:

if (offset < _charOffsets[col])
{
    for (; offset < _charOffsets[col]; --col)
    {
    }
}
// ...

It immediately makes way more sense this way, since the if condition and for loop now match perfectly.

But looking at it again, I now realize that the entire code is way too complex. I've simplified it massively to just this:

til::CoordType CharToColumnMapper::GetLeadingColumnAt(ptrdiff_t targetOffset) noexcept
{
    targetOffset = clamp(targetOffset, 0, _lastCharOffset);

    auto col = _currentColumn;
    auto currentOffset = _charOffsets[col];

    while (targetOffset > (currentOffset & CharOffsetsMask))
    {
        currentOffset = _charOffsets[++col];
    }
    while (targetOffset < currentOffset)
    {
        currentOffset = _charOffsets[--col];
    }

    _currentColumn = col;
    return col;
}

That's it. Just 2 loops. Makes it way more robust and faster too. I'll submit a PR soon.

Thank you so much again for finding these bugs!

DHowett pushed a commit that referenced this pull request Feb 29, 2024
Aside from overall simplifying `CharToColumnMapper` this fixes 2 bugs:
* The backward search loop may have iterated 1 column too far,
  because it didn't stop at `*current <= *target`, but rather at
  `*(current - 1) <= *target`. This issue was only apparent when
  surrogate pairs were being used in a row.
* When the target offset is that of a trailing surrogate pair
  the forward search loop may have iterated 1 column too far.
  It's somewhat unlikely for this to happen since this code is
  only used through ICU, but you never know.

This is a continuation of PR #16775.
DHowett pushed a commit that referenced this pull request Feb 29, 2024
Aside from overall simplifying `CharToColumnMapper` this fixes 2 bugs:
* The backward search loop may have iterated 1 column too far,
  because it didn't stop at `*current <= *target`, but rather at
  `*(current - 1) <= *target`. This issue was only apparent when
  surrogate pairs were being used in a row.
* When the target offset is that of a trailing surrogate pair
  the forward search loop may have iterated 1 column too far.
  It's somewhat unlikely for this to happen since this code is
  only used through ICU, but you never know.

This is a continuation of PR #16775.

(cherry picked from commit 043d5cd)
Service-Card-Id: 91955569
Service-Version: 1.20
DHowett pushed a commit that referenced this pull request Feb 29, 2024
Aside from overall simplifying `CharToColumnMapper` this fixes 2 bugs:
* The backward search loop may have iterated 1 column too far,
  because it didn't stop at `*current <= *target`, but rather at
  `*(current - 1) <= *target`. This issue was only apparent when
  surrogate pairs were being used in a row.
* When the target offset is that of a trailing surrogate pair
  the forward search loop may have iterated 1 column too far.
  It's somewhat unlikely for this to happen since this code is
  only used through ICU, but you never know.

This is a continuation of PR #16775.

(cherry picked from commit 043d5cd)
Service-Card-Id: 91955617
Service-Version: 1.19
DHowett pushed a commit that referenced this pull request Feb 29, 2024
Aside from overall simplifying `CharToColumnMapper` this fixes 2 bugs:
* The backward search loop may have iterated 1 column too far,
  because it didn't stop at `*current <= *target`, but rather at
  `*(current - 1) <= *target`. This issue was only apparent when
  surrogate pairs were being used in a row.
* When the target offset is that of a trailing surrogate pair
  the forward search loop may have iterated 1 column too far.
  It's somewhat unlikely for this to happen since this code is
  only used through ICU, but you never know.

This is a continuation of PR #16775.

(cherry picked from commit 043d5cd)
Service-Card-Id: 91955617
Service-Version: 1.19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Output Related to output processing (inserting text into buffer, retrieving buffer text, etc.) AutoMerge Marked for automatic merge by the bot when requirements are met Issue-Bug It either shouldn't be doing this or needs an investigation. Priority-2 A description (P2) Product-Terminal The new Windows Terminal.
Projects
Status: Rejected
Development

Successfully merging this pull request may close these issues.

3 participants