-
Notifications
You must be signed in to change notification settings - Fork 129
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raw html (div) in markdown does not render in PDF #513
Comments
PDF support is also important for me. Currently, Here's the PDF version: https://nbsphinx.readthedocs.io/_/downloads/en/0.8.0/pdf/#subsection.3.8 So if you use exactly The border and colors of the frames in the PDF can be specified like this: Lines 135 to 139 in 98005a9
Is that sufficient for your needs? If you want to change the default colors or border widths, please make a PR. |
Matthias, Thank you for your reply. I generally use three levels (green, yellow, red...a cliché, I know and also not very colorblind friendly). For now, I could probably just use some sort of neutral color and embed an appropriate image (would that work, ok?). Although there would be some desire for formatting (I'm not a purist...a table?). Is there a fundamental issue with supporting embedded HTML? I would have thought that was "out of the box" delegated to whatever translator was used. I just don't know enough about the inner workings to weigh in on a solution but I'd be willing to help if I could. Cheers, |
I did some digging and it's definitely them not you. Sphinx Sphinx uses Pandoc 1. markdown to pdf directly. This ignores raw HTML formatting in general and the content of the div becomes escaped, pre-formatted text. 2. markdown to pdf using markdown_strict option According to the documentation, the default for markdown is to ignore raw HTML tags in Markdown (contra the standard). However, you can specify -f markdown_strict to process raw HTML tags. a. without lines after the 3. markdown to html using markdown_strict option (no weird line breaks) As you might imagine, this is perfect so I thought, what if we go to LaTeX through HTML instead of directly there? 4. markdown to html to pdf using markdown_strict option (no weird line breaks) This works really well...about 80% of the way there. raw HTML formatting outside divs is applied. The content of the div is formatted correctly. It even output the enclosed table (with somewhat odd formatting). However, the background color is removed. If this pipeline were possible (I'm not sure) the question would be, is there a way to define and apply styles to divs by defining them in latex? CSS? and indicating where they should be applied to get the 20%? |
Have you been talking about It has nothing to do with If you need help with I was talking about using Markdown cells in Jupyter notebooks. There the boxes work as I've described (or rather linked to the docs) above. Are you not using Jupyter notebooks? If you want to use Markdown files but still get the boxes I'm talking about, you can use Jupytext with one of the supported Markdown based formats (see https://nbsphinx.readthedocs.io/en/0.8.0/custom-formats.html#Example:-Jupytext).
I guess it's not fundamental, but implementing a full HTML parser is not a small endeavor.
Well, if you use HTML as target format, the HTML snippets can easily be passed through, but what is LaTeX supposed to do with them? For this to work with LaTeX output, the HTML actually has to be parsed.
Yes, that's what the CommonMark standard demands.
In this case you should open an issue in the appropriate issue tracker. See also jupyter/nbconvert#1125 and jupyter/notebook#1292 (comment). It would be really great if we could get proper support for those |
I think I've mentioned in comments above, I don't know how nbsphinx works or what it uses directly or indirectly. But I'm willing to learn. Because I had seen issues posted in the past that referenced both Sphinx and Pandoc and I looked to see what the possible underlying tools might do in a similar situation. This is what I have documented above. I thought it might be a helpful place to start a discussion about workarounds (see below). It's not just The Markdown standard is that all raw HTML elements should be processed. This includes
I don't know enough about either. I'm just reporting what I see.
which is not very aesthetically pleasing...if I remove the extra lines (so that the insides are not treated as Markdown: which is what I expect. Ultimately, all I'm saying is that my expectation is that embedded HTML in a Notebook's Markdown Cell will be faithfully rendered in the PDF, within the limits of translation, and that I should not necessarily have to muck up one rendering to make another successful. It doesn't and I don't know why and I don't know if it can be fixed or if there's a workaround. And, again, this isn't just Thank you for your efforts. |
OK, good to know. So are the special
Well there isn't really a Markdown standard (yet). And CommonMark, the closest thing to a standard, does explicitly say that it's not a full HTML parser. It can just detect a few HTML-like structures and pass them along to whatever is supposed to display the result. That works well if HTML is the end result, but it decidedly doesn't work for some other formats like LaTeX. Any HTML support for LaTeX output would have to be individually implemented, which I've done for very few cases, as mentioned above. If you have suggestions for further special cases that should be supported, please let me know!
There are certainly tools that can do that.
But you can of course take the HTML output of Sphinx (including the stuff from
Please report that to the appropriate issue tracker!
As you say, there are limits in every toolchain. I've tried to work around the mentioned limitations by supporting the
Yes, that should be the goal, but sometimes this is not plausibly achievable with a given set of tools.
Yes, again, that's known behavior. If you want to have a PDF that's closer to the HTML appearance, you should use a tool that directly creates PDF from HTML pages, without the LaTeX middleman. You could try https:/betatim/notebook-as-pdf or you could try the The result looks somewhat like a browser screenshot. This may or may not be what you want. If you find other alternatives, please let me know so I can add them to my collection of links: https://nbsphinx.readthedocs.io/en/0.8.0/links.html. |
Thanks for your help. The alerts are not quite what I'm looking for but, knowing the limitations, I was able to make a table work well enough. Cheers! |
First, I love nbsphinx. Unlike many (most?), however, my target is PDFs. I find my students prefer PDF files over Notebooks or HTML for a variety of reasons but mostly because they can read them on their devices when they don't have internet access and they do, sometimes, kill a tree printing them out (as do I).
Second, I'm having a bit of trouble with raw HTML working in the translation from HTML to PDF (and I'm not sure exactly which part isn't working). When I have a div with a colored background like so:
the result in the PDF is plain:
I'm not completely clear on the entire infrastructure so I'm not sure where the problem may be. Based on what I could find, Sphinx (CommonMark) should support this (although to what extent, I'm not sure). Pandoc does as well (but only with
markdown_strict
, not the defaults).Perhaps there's an easy fix/setting I'm missing?
The text was updated successfully, but these errors were encountered: