Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve markdown parsing #112

Merged
merged 2 commits into from
Jan 16, 2018
Merged

Improve markdown parsing #112

merged 2 commits into from
Jan 16, 2018

Conversation

SleepWalker
Copy link
Collaborator

This PR fixes #38

But before accepting it, probably we should a little bit discuss it.

The parser is partially based on this spec. It takes 100+ lines of code and adds 0.33kb to dist size.

The reason why it takes so much is because of nesting, escaping support and also because of some edge cases, which can not be achieved using only js regexp. We can a little bit reduce parser size if we drop nesting support (which is actually currently broken in browser due to bad styles order e.g.: console.log('%c%cfoo%c%c', 'font-weight: bold;', '', 'font-style:italic;', '') for logger.log('*_foo_*')). What do you think? (@caiogondim , @bennyn )

There is one more feature, that looks interesting to me, but I've skipped it to keep code a little bit simpler. But if you think, that we need it, I can implement it too. Please see this test case:

xit('should not format snake_case words', () => { // do we really need this?
  expect(markdown.parse('foo_bar_')).toBe('foo_bar_')
  expect(markdown.parse('_foo_bar')).toBe('_foo_bar')

  expect(markdown.parse('_*te_st*')).toBe('_<b>te_st</b>')
  expect(markdown.parse('_*te_st*_')).toBe('<i><b>te_st</b></i>')
})

BTW. In my console on linux chalk.italic looks almost the same as the normal text.

@codecov-io
Copy link

codecov-io commented Jan 7, 2018

Codecov Report

Merging #112 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@          Coverage Diff          @@
##           master   #112   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          14     12    -2     
  Lines         228    255   +27     
  Branches       41     49    +8     
=====================================
+ Hits          228    255   +27
Impacted Files Coverage Δ
src/markdown/node.js 100% <100%> (ø) ⬆️
src/markdown/Markdown.js 100% <100%> (ø)
src/markdown/browser.js 100% <100%> (ø) ⬆️

@caiogondim
Copy link
Owner

That's just great.
Rewriting the Markdown parser is something that was on my backlog since a long time.

Regarding the skipped test case, it makes sense for me, especially for printing code:
markdown.parse('`SOME_REDUX_ACTION`') // want it to be expected as code only, not as italic inside code

Or maybe the user should explicitly escape characters it doesn't want to be interpreted as markdown tokens. And right escaping don't work. I like the latter better. Explicit > implicit.

What are your thoughts?

And great PR again 🎉

@SleepWalker
Copy link
Collaborator Author

SleepWalker commented Jan 10, 2018 via email

@caiogondim
Copy link
Owner

caiogondim commented Jan 10, 2018

Current parser implementation already supports escaping and it completely ignores markdown (and escape chars too) inside code spans.

I created an example to better illustrate this https://runkit.com/caiogondim/5a3aca00b147010012535bfb

@SleepWalker
Copy link
Collaborator Author

The parser from this PR will produce:

`SOME\_REDUX\_ACTION` => <code>SOME\_REDUX\_ACTION</code>
`OTHER_REDUX_ACTION` => <code>OTHER_REDUX_ACTION</code>

but

*SOME\_REDUX\_ACTION* => <b>SOME_REDUX_ACTION</b>
*OTHER_REDUX_ACTION* => <b>OTHER<i>REDUX</i>ACTION</b>
(if I'll make the skipped test to pass it should produce: <b>OTHER_REDUX_ACTION</b>)

_SOME\_REDUX\_ACTION_ => <i>SOME_REDUX_ACTION</i>
_OTHER_REDUX_ACTION_ => <i>OTHER</i>REDUX<i>ACTION</i>
(if I'll make the skipped test to pass it should produce: <i>OTHER_REDUX_ACTION</i>)

So the code spans are work as expected, but the _ may be improved (but I'm not sure if this improvement critical for purposes of logging lib. You should simply properly use md and wrap all the code into the code span instead of italics etc.)

@SleepWalker SleepWalker reopened this Jan 12, 2018
@SleepWalker
Copy link
Collaborator Author

Sorry, that was a miss click :D

@caiogondim caiogondim merged commit 008b221 into master Jan 16, 2018
@caiogondim caiogondim deleted the improve-markdown-parsing branch January 16, 2018 22:48
@caiogondim
Copy link
Owner

@SleepWalker Again, that was a beautiful PR 👏 👏 👏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Escaping special characters
3 participants