-
-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use JPype instead of subprocess #352
Comments
Thanks for suggestion. Initially I thought to use py4j, but at that time, tabula-java development was active, so I pended the idea for maintainability. I'd think tweaking Lines 58 to 102 in b24e3bd
Or, we may be able to do something similar as tabula-rs tabulapdf/tabula-java#444 Anyway, I'm looking forward to contribution for the change. |
Thanks for the considerations!
I think py4j uses a remote tunnel to operate the JVM, which results in a data transfer penalty, while JPype has a native interface with shared memory approach.
If you call into the CLI rather than using the actual API, you get the same interface as with subprocess (i.e. no maintainability impact), only in a more performant way.
Disclaimer: Unfortunately, I'm fairly busy with some other projects (pypdfium2 & co.) and thus don't have an intent to work on this myself. |
I tried jpype on #355 However, I encountered when I run pytest with multiple files, it always fails. This is a huge blocker for introducing jpype, and I'm about to give up. |
I'm still a bit tired, but all the test suite failures seem to be caused by |
I guess you're trying the old PR. Can you check the new one? #356 |
I don't currently have the env to test this locally, I can only try to help with concrete questions. |
Ah, sorry for the confusion. I solved the issue with separating test files and processes. #356 (comment) It's okay to merge into master for now. |
Found a weird error blames of jpype. I can't reproduce it, so need to get some help. |
It presumably is a specific environment issue with jpype. I will implement a workaround to enable subprocess as an option. |
Sorry for the problems, I'm afraid I don't know the cause either. |
@mara004 Thanks for your suggestion. However, I can't reproduce issue by my side and I'm concerned to raise on behalf of them since I don't know what is the trigger. I released v2.8.2 to automatically fallback to subprocess if there's any import error on jpype. |
Is your feature request related to a problem? Please describe.
Repeatedly calling the Jar as java subprocess causes overhead (start java, load modules, etc.)
Describe the solution you'd like
Switch to JPype to launch the VM only once and call into the APIs directly.
For a start, the CLI main functions could be used to simplify the transition, maybe switching to the actual API level eventually.
Describe alternatives you've considered
Sticking with subprocess is possible but has the drawbacks described above.
Additional context
The python-pdfbox project had a similar transition from subprocess to JPype.
The text was updated successfully, but these errors were encountered: