Combined pdf? #2

folofjc · 2020-11-20T17:29:56Z

Thanks for this!

I notice that when it runs, it simply makes one pdf of each annotated page. Does it provide a total pdf with the annotations?

EDIT: I just saw that it is commented out in the source. The problem is the size of the page and the ToC working? Is it not possible to simply "apply" the annotations on top of the original pdf page?

lucasrla · 2020-11-23T12:43:40Z

Hey @folofjc,

I know there is at least one convenient alternative for exporting entire PDFs with annotations: using reMarkable's official desktop app. It has served me well on the Mac (there is a Windows version as well): https://support.remarkable.com/hc/en-us/articles/360002665378-Desktop-app

As you already noticed (by reading the comments), I ran into issues while trying to implement that feature with PyMuPDF. The deal breaker for me at the time were the differences in page size (but ToCs did not work either).

The ToC issue likely requires help from PyMuPDF upstream.

On the other hand, it should be possible to fix the page size within remarks. It is simply a matter of time to investigate the resizing/cropping process more carefully. If you are willing to help, pull requests are very welcome!

folofjc · 2020-11-23T13:36:14Z

Hi @lucasrla,

Thanks for the info. I have been using remarkable's app, however my issue is that it "flattens" the pdf so that annotations are not seen as annotations. So when I open it in adobe acrobat, etc, they do not appear as annotations. Using remarks, they do show up as annotations (I tested on the individual page pdfs).

I read a lot of academic journal papers and mark them up. Zotero has the ability to parse an annotated pdf and pull out all your annotations so that you can quickly look at them. But I cannot do this with remarkable exported pdfs. Which is why I am still using my android tablet to read these papers, since I can make annotations as true annotations in a pdf reader.

I agree that I would like the ToC to still work (but that is personally less of a priority for me since I still keep the original pdf).

I do not know PyMuPDF at all so I am not sure how much help I could be, but perhaps I will look into it since it is the only thing lacking!

lucasrla · 2020-11-29T18:49:15Z

Hey @folofjc,

I have just pushed a commit that adds a "combined_pdf" feature. Could you please pull to origin master/HEAD and test it out?

Also, I am now mentioning your use of remarks together with Zotero in the README file, I hope you don't mind it.

Thanks

folofjc · 2020-11-30T12:24:31Z

Hey @lucasrla,

I tried it out on a couple and it looks good, thanks! A few issues:

I think that the filename has an error; it puts a space between the original name and the "_remarks". I haven't looked through the code enough to try to find it.
The combined pdf is at the top level. Would it make more sense to put it in the original directory structure with the individual pages?
I get a lot of errors like "Found highlighted text but couldn't create markdown from page pip install fails when installing pymupdf on macOS Big Sur #7" from remarks.py, as well as from mupdf: "mupdf: expected object number" and "mupdf: kid not found in parent's kids array". Sometimes just one of them, sometimes other. Is this a problem with my original pdf? The ones about highlighted text are only when I want markdown output and go away if I only care about pdf, so it looks like getting the highlights is okay but forming markdown is not. However, even in pdf mode, I still get the mupdf errors.
Links are gone on the annotated pdf. I think this is from making the pdf page again from scratch. However, on non-annotated pages the links within the pdf still work. I guess it is not possible to apply the annotations on top of the original pdf because of the page size difference? This isn't a huge problem, since I still have the original pdf. But for workflows where the original pdf is overwritten, this would be problematic.

lucasrla · 2020-11-30T16:41:59Z

Glad to hear that it looks good!

Answering your points:

The space between the original file name and " _remarks" was intentional. You can trim it in your local copy at this line: https:/lucasrla/remarks/blob/master/remarks/remarks.py#L166
The combined pdf at top level was intentional as well. You can tweak that same line (L166) to save the file anywhere else.
I have been using remarks for a few months now and have never experienced any mupdf error. Your issues seem either due to a malformed PDF or a bug in PyMuPDF/MuPDF. Try googling about them, searching their repo, etc. Regarding the text extraction for Markdown, many things could go wrong there... If you don't have OCRmyPDF yet, I recommend installing it. But please also keep in mind that it does not address all the potential edge cases that can happen with text/OCR in PDFs.
Yes, the links are gone because I am recreating each annotated PDF page with adjusted dimensions via Page.showPDFpage(). There is the following note in its documentation:

In contrast to method Document.insertPDF(), this method does not copy annotations or links, so they are not shown. But all its other resources (text, images, fonts, etc.) will be imported into the current PDF.

I haven't tested it extensively, but it seems from the documentation that Document.insertPDF() does not resize pages.

If Document.insertPDF() does not do resizing, then an alternative would be to recreate manually the links (similarly to what I am doing with annotations). See, for instance, Notes on Supporting Links. If you have the appetite, contributions are welcome!

Given that the combined PDF issue is now solved, I will go ahead and close this issue for now.

folofjc · 2020-11-30T16:46:37Z

Okay, thanks. I wonder if the size of the page is the same, then could just use Document.insertPDF(). So only recreate if the dimensions are different. Like I said, not huge for me because Zotero keeps both the original and the annotated one. Thanks again for looking into this, it really makes the rM actually useful for me!

lucasrla · 2020-11-30T19:01:48Z

Nice. Welcome aboard!

I see your point about doing the resize only if necessary. If a PDF has only "well-behaved" highlights on it, then that seems like a viable path for using Document.insertPDF() and keeping the links. If there are scribbles on the margins, unfortunately resizing is almost surely necessary.

The need for resizing when there are annotations on the margins is due to differences between the aspect ratio of the reMarkable (0.75) and the ones of common paper sizes (e.g. ~0.70 for A4). That is, the device itself already resizes PDFs while displaying most documents. If we then annotate on the margins of a page, we make the resizing "definitive" for that page.

If/when I find some time in the upcoming days, I might take a shot at implementing this. I will let you know.

lucasrla · 2020-12-09T00:24:47Z

Hey @folofjc, I have just pushed through a new commit that should preserve the links in the original PDF.

Can you please pull to the most recent commit (99ec38) and report if it is working good for you in the new discussion thread that I have started just for that?

Thanks

folofjc · 2021-01-20T11:20:37Z

Based upon that commit and the discussion thread, I think this issue can be closed. Thanks again!

lucasrla closed this as completed Nov 30, 2020

lucasrla added the enhancement New feature or request label Nov 30, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Combined pdf? #2

Combined pdf? #2

folofjc commented Nov 20, 2020 •

edited

Loading

lucasrla commented Nov 23, 2020

folofjc commented Nov 23, 2020

lucasrla commented Nov 29, 2020

folofjc commented Nov 30, 2020

lucasrla commented Nov 30, 2020 •

edited

Loading

folofjc commented Nov 30, 2020

lucasrla commented Nov 30, 2020 •

edited

Loading

lucasrla commented Dec 9, 2020

folofjc commented Jan 20, 2021

Combined pdf? #2

Combined pdf? #2

Comments

folofjc commented Nov 20, 2020 • edited Loading

lucasrla commented Nov 23, 2020

folofjc commented Nov 23, 2020

lucasrla commented Nov 29, 2020

folofjc commented Nov 30, 2020

lucasrla commented Nov 30, 2020 • edited Loading

folofjc commented Nov 30, 2020

lucasrla commented Nov 30, 2020 • edited Loading

lucasrla commented Dec 9, 2020

folofjc commented Jan 20, 2021

folofjc commented Nov 20, 2020 •

edited

Loading

lucasrla commented Nov 30, 2020 •

edited

Loading

lucasrla commented Nov 30, 2020 •

edited

Loading