Improve markdown parsing #112

SleepWalker · 2018-01-07T20:39:37Z

This PR fixes #38

But before accepting it, probably we should a little bit discuss it.

The parser is partially based on this spec. It takes 100+ lines of code and adds 0.33kb to dist size.

The reason why it takes so much is because of nesting, escaping support and also because of some edge cases, which can not be achieved using only js regexp. We can a little bit reduce parser size if we drop nesting support (which is actually currently broken in browser due to bad styles order e.g.: console.log('%c%cfoo%c%c', 'font-weight: bold;', '', 'font-style:italic;', '') for logger.log('*_foo_*')). What do you think? (@caiogondim , @bennyn )

There is one more feature, that looks interesting to me, but I've skipped it to keep code a little bit simpler. But if you think, that we need it, I can implement it too. Please see this test case:

xit('should not format snake_case words', () => { // do we really need this?
  expect(markdown.parse('foo_bar_')).toBe('foo_bar_')
  expect(markdown.parse('_foo_bar')).toBe('_foo_bar')

  expect(markdown.parse('_*te_st*')).toBe('_<b>te_st</b>')
  expect(markdown.parse('_*te_st*_')).toBe('<i><b>te_st</b></i>')
})

BTW. In my console on linux chalk.italic looks almost the same as the normal text.

codecov-io · 2018-01-07T20:42:07Z

Codecov Report

Merging #112 into master will not change coverage.
The diff coverage is 100%.

@@          Coverage Diff          @@
##           master   #112   +/-   ##
=====================================
  Coverage     100%   100%           
=====================================
  Files          14     12    -2     
  Lines         228    255   +27     
  Branches       41     49    +8     
=====================================
+ Hits          228    255   +27

Impacted Files	Coverage Δ
src/markdown/node.js	`100% <100%> (ø)`	⬆️
src/markdown/Markdown.js	`100% <100%> (ø)`
src/markdown/browser.js	`100% <100%> (ø)`	⬆️

caiogondim · 2018-01-09T18:56:46Z

That's just great.
Rewriting the Markdown parser is something that was on my backlog since a long time.

Regarding the skipped test case, it makes sense for me, especially for printing code:
markdown.parse('`SOME_REDUX_ACTION`') // want it to be expected as code only, not as italic inside code

Or maybe the user should explicitly escape characters it doesn't want to be interpreted as markdown tokens. And right escaping don't work. I like the latter better. Explicit > implicit.

What are your thoughts?

And great PR again 🎉

SleepWalker · 2018-01-10T07:14:36Z

Current parser implementation already supports escaping and it completely ignores markdown (and escape chars too) inside `code` spans. The skipped test is only about the case, when user uses italic and expects markdown to be ignored inside variable names in snaked case naming style. In order to support such case we need to implement additional rules only for italic (detect left and right flanking delimiters). I think, that this rule is not so important to spend the bytes on it's implementation, because I expect developers to use code spans for displaying code, instead of italic :)

caiogondim · 2018-01-10T15:17:20Z

Current parser implementation already supports escaping and it completely ignores markdown (and escape chars too) inside code spans.

I created an example to better illustrate this https://runkit.com/caiogondim/5a3aca00b147010012535bfb

SleepWalker · 2018-01-12T07:48:11Z

The parser from this PR will produce:

`SOME\_REDUX\_ACTION` => <code>SOME\_REDUX\_ACTION</code>
`OTHER_REDUX_ACTION` => <code>OTHER_REDUX_ACTION</code>

but

*SOME\_REDUX\_ACTION* => <b>SOME_REDUX_ACTION</b>
*OTHER_REDUX_ACTION* => <b>OTHER<i>REDUX</i>ACTION</b>
(if I'll make the skipped test to pass it should produce: <b>OTHER_REDUX_ACTION</b>)

_SOME\_REDUX\_ACTION_ => <i>SOME_REDUX_ACTION</i>
_OTHER_REDUX_ACTION_ => <i>OTHER</i>REDUX<i>ACTION</i>
(if I'll make the skipped test to pass it should produce: <i>OTHER_REDUX_ACTION</i>)

So the code spans are work as expected, but the _ may be improved (but I'm not sure if this improvement critical for purposes of logging lib. You should simply properly use md and wrap all the code into the code span instead of italics etc.)

SleepWalker · 2018-01-12T07:48:48Z

Sorry, that was a miss click :D

caiogondim · 2018-01-16T22:49:17Z

@SleepWalker Again, that was a beautiful PR 👏 👏 👏

SleepWalker added 2 commits January 7, 2018 20:58

Implement md parser with better supporting for nesting and escaping

51d34e2

Added tests for md styles depending from env

9323d31

SleepWalker requested a review from caiogondim January 7, 2018 20:39

SleepWalker closed this Jan 12, 2018

SleepWalker reopened this Jan 12, 2018

caiogondim merged commit 008b221 into master Jan 16, 2018

caiogondim deleted the improve-markdown-parsing branch January 16, 2018 22:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve markdown parsing #112

Improve markdown parsing #112

SleepWalker commented Jan 7, 2018

codecov-io commented Jan 7, 2018 •

edited

Loading

caiogondim commented Jan 9, 2018

SleepWalker commented Jan 10, 2018 via email

caiogondim commented Jan 10, 2018 •

edited

Loading

SleepWalker commented Jan 12, 2018

SleepWalker commented Jan 12, 2018

caiogondim commented Jan 16, 2018

Improve markdown parsing #112

Improve markdown parsing #112

Conversation

SleepWalker commented Jan 7, 2018

codecov-io commented Jan 7, 2018 • edited Loading

Codecov Report

caiogondim commented Jan 9, 2018

SleepWalker commented Jan 10, 2018 via email

caiogondim commented Jan 10, 2018 • edited Loading

SleepWalker commented Jan 12, 2018

SleepWalker commented Jan 12, 2018

caiogondim commented Jan 16, 2018

codecov-io commented Jan 7, 2018 •

edited

Loading

caiogondim commented Jan 10, 2018 •

edited

Loading