Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Combined pdf? #2

Closed
folofjc opened this issue Nov 20, 2020 · 9 comments
Closed

Combined pdf? #2

folofjc opened this issue Nov 20, 2020 · 9 comments
Labels
enhancement New feature or request

Comments

@folofjc
Copy link

folofjc commented Nov 20, 2020

Thanks for this!

I notice that when it runs, it simply makes one pdf of each annotated page. Does it provide a total pdf with the annotations?

EDIT: I just saw that it is commented out in the source. The problem is the size of the page and the ToC working? Is it not possible to simply "apply" the annotations on top of the original pdf page?

@lucasrla
Copy link
Owner

Hey @folofjc,

I know there is at least one convenient alternative for exporting entire PDFs with annotations: using reMarkable's official desktop app. It has served me well on the Mac (there is a Windows version as well): https://support.remarkable.com/hc/en-us/articles/360002665378-Desktop-app

As you already noticed (by reading the comments), I ran into issues while trying to implement that feature with PyMuPDF. The deal breaker for me at the time were the differences in page size (but ToCs did not work either).

The ToC issue likely requires help from PyMuPDF upstream.

On the other hand, it should be possible to fix the page size within remarks. It is simply a matter of time to investigate the resizing/cropping process more carefully. If you are willing to help, pull requests are very welcome!

@folofjc
Copy link
Author

folofjc commented Nov 23, 2020

Hi @lucasrla,

Thanks for the info. I have been using remarkable's app, however my issue is that it "flattens" the pdf so that annotations are not seen as annotations. So when I open it in adobe acrobat, etc, they do not appear as annotations. Using remarks, they do show up as annotations (I tested on the individual page pdfs).

I read a lot of academic journal papers and mark them up. Zotero has the ability to parse an annotated pdf and pull out all your annotations so that you can quickly look at them. But I cannot do this with remarkable exported pdfs. Which is why I am still using my android tablet to read these papers, since I can make annotations as true annotations in a pdf reader.

I agree that I would like the ToC to still work (but that is personally less of a priority for me since I still keep the original pdf).

I do not know PyMuPDF at all so I am not sure how much help I could be, but perhaps I will look into it since it is the only thing lacking!

@lucasrla
Copy link
Owner

Hey @folofjc,

I have just pushed a commit that adds a "combined_pdf" feature. Could you please pull to origin master/HEAD and test it out?

Also, I am now mentioning your use of remarks together with Zotero in the README file, I hope you don't mind it.

Thanks

@folofjc
Copy link
Author

folofjc commented Nov 30, 2020

Hey @lucasrla,

I tried it out on a couple and it looks good, thanks! A few issues:

  • I think that the filename has an error; it puts a space between the original name and the "_remarks". I haven't looked through the code enough to try to find it.
  • The combined pdf is at the top level. Would it make more sense to put it in the original directory structure with the individual pages?
  • I get a lot of errors like "Found highlighted text but couldn't create markdown from page pip install fails when installing pymupdf on macOS Big Sur #7" from remarks.py, as well as from mupdf: "mupdf: expected object number" and "mupdf: kid not found in parent's kids array". Sometimes just one of them, sometimes other. Is this a problem with my original pdf? The ones about highlighted text are only when I want markdown output and go away if I only care about pdf, so it looks like getting the highlights is okay but forming markdown is not. However, even in pdf mode, I still get the mupdf errors.
  • Links are gone on the annotated pdf. I think this is from making the pdf page again from scratch. However, on non-annotated pages the links within the pdf still work. I guess it is not possible to apply the annotations on top of the original pdf because of the page size difference? This isn't a huge problem, since I still have the original pdf. But for workflows where the original pdf is overwritten, this would be problematic.

@lucasrla
Copy link
Owner

lucasrla commented Nov 30, 2020

Glad to hear that it looks good!

Answering your points:

  1. The space between the original file name and " _remarks" was intentional. You can trim it in your local copy at this line: https:/lucasrla/remarks/blob/master/remarks/remarks.py#L166

  2. The combined pdf at top level was intentional as well. You can tweak that same line (L166) to save the file anywhere else.

  3. I have been using remarks for a few months now and have never experienced any mupdf error. Your issues seem either due to a malformed PDF or a bug in PyMuPDF/MuPDF. Try googling about them, searching their repo, etc. Regarding the text extraction for Markdown, many things could go wrong there... If you don't have OCRmyPDF yet, I recommend installing it. But please also keep in mind that it does not address all the potential edge cases that can happen with text/OCR in PDFs.

  4. Yes, the links are gone because I am recreating each annotated PDF page with adjusted dimensions via Page.showPDFpage(). There is the following note in its documentation:

In contrast to method Document.insertPDF(), this method does not copy annotations or links, so they are not shown. But all its other resources (text, images, fonts, etc.) will be imported into the current PDF.

I haven't tested it extensively, but it seems from the documentation that Document.insertPDF() does not resize pages.

If Document.insertPDF() does not do resizing, then an alternative would be to recreate manually the links (similarly to what I am doing with annotations). See, for instance, Notes on Supporting Links. If you have the appetite, contributions are welcome!


Given that the combined PDF issue is now solved, I will go ahead and close this issue for now.

@folofjc
Copy link
Author

folofjc commented Nov 30, 2020

Okay, thanks. I wonder if the size of the page is the same, then could just use Document.insertPDF(). So only recreate if the dimensions are different. Like I said, not huge for me because Zotero keeps both the original and the annotated one. Thanks again for looking into this, it really makes the rM actually useful for me!

@lucasrla
Copy link
Owner

lucasrla commented Nov 30, 2020

Nice. Welcome aboard!

I see your point about doing the resize only if necessary. If a PDF has only "well-behaved" highlights on it, then that seems like a viable path for using Document.insertPDF() and keeping the links. If there are scribbles on the margins, unfortunately resizing is almost surely necessary.

The need for resizing when there are annotations on the margins is due to differences between the aspect ratio of the reMarkable (0.75) and the ones of common paper sizes (e.g. ~0.70 for A4). That is, the device itself already resizes PDFs while displaying most documents. If we then annotate on the margins of a page, we make the resizing "definitive" for that page.

If/when I find some time in the upcoming days, I might take a shot at implementing this. I will let you know.

@lucasrla lucasrla added the enhancement New feature or request label Nov 30, 2020
@lucasrla
Copy link
Owner

lucasrla commented Dec 9, 2020

Hey @folofjc, I have just pushed through a new commit that should preserve the links in the original PDF.

Can you please pull to the most recent commit (99ec38) and report if it is working good for you in the new discussion thread that I have started just for that?

Thanks

@folofjc
Copy link
Author

folofjc commented Jan 20, 2021

Based upon that commit and the discussion thread, I think this issue can be closed. Thanks again!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants