Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Email autolink shouldn't be started by HTML tags #566

Merged
merged 4 commits into from
Mar 10, 2018

Conversation

aidantwoods
Copy link
Collaborator

Fixes #565

@aidantwoods
Copy link
Collaborator Author

The regex looks a bit huge, but it isn't very complex (the majority is defining character classes), so I don't think it'll be any (much?) worse than the existing regex complexity (open to performance suggestions though, cc: @PhrozenByte)

I think filter_var's email filter doesn't match the definition that CommonMark references (CommonMark uses the HTML5 definition), but I'm not sure of the exact discrepancies.

Copy link
Contributor

@PhrozenByte PhrozenByte left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The regex was taken from CommonMark, right? It is stricter than I'd write it myself, but since we're trying to be compatible with CommonMark we should use it anyway.

Looks great! 👍

Parsedown.php Outdated
@@ -1142,8 +1142,14 @@ protected function inlineCode($Excerpt)

protected function inlineEmailTag($Excerpt)
{
if (strpos($Excerpt['text'], '>') !== false and preg_match('/^<((mailto:)?\S+?@\S+?)>/i', $Excerpt['text'], $matches))
{
$commonMarkEmail = '[a-zA-Z0-9.!#$%&\'*+\/=?^_`{|}~-]++@[a-zA-Z0-9]'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just making the regex a little easier to read:

$hostnameLabel = '[a-zA-Z0-9](?:[a-zA-Z0-9-]{0,61}[a-zA-Z0-9])?';
$commonMarkEmail = '[a-zA-Z0-9.!#$%&\'*+\/=?^_`{|}~-]++@' . $hostnameLabel . '(?:\.' . $hostnameLabel . ')*';

@aidantwoods
Copy link
Collaborator Author

aidantwoods commented Mar 9, 2018

The regex was taken from CommonMark, right?

Yup, (though they ripped it straight out the HTML5 spec ;p). The only changes I made were escaping a backslash forwardslash and a made the "username" part of the email possessive (since that character class is mutually exclusive with @ I don't think we need backtracking?).

@PhrozenByte
Copy link
Contributor

Alright. Yeah, no backtracking necessary there.

Thanks @PhrozenByte for the suggestion :)
@aidantwoods aidantwoods merged commit aac00ac into erusev:master Mar 10, 2018
@aidantwoods aidantwoods deleted the fix/email-autolink branch March 10, 2018 00:07
@aidantwoods aidantwoods added this to the 1.8.0 milestone Mar 28, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants