Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running two models with same engine at once #179

Closed
tspannhw opened this issue Sep 29, 2020 · 7 comments
Closed

Running two models with same engine at once #179

tspannhw opened this issue Sep 29, 2020 · 7 comments
Labels
bug Something isn't working

Comments

@tspannhw
Copy link

Description

Cannot run two models in NiFi 1.11.4 if both using Pytorch engine.

Expected Behavior

Both should be able to use the library

Error Message

java.lang.UnsatisfiedLinkError: /Users/tspann/.djl.ai/pytorch/1.6.0-cpu-osx-x86_64/0.8.0-SNAPSHOT-cpu-libdjl_torch.dylib (Library is already loaded in another ClassLoader)

How to Reproduce?

https:/tspannhw/nifi-djlsentimentanalysis-processor

Steps to reproduce

(Paste the commands you ran that produced the error.)

  1. install NiFi nars for both models
  2. run at same time

What have you tried to solve it?

  1. Tweaked poms
  2. Tried different engines, but that's a stop gap and I want Pytorch for both Bert QA and Sentiment.

Environment Info

Run my nifi processors
https:/tspannhw/nifi-djlqa-processor
https:/tspannhw/nifi-djlsentimentanalysis-processor

java.lang.UnsatisfiedLinkError: /Users/tspann/.djl.ai/pytorch/1.6.0-cpu-osx-x86_64/0.8.0-SNAPSHOT-cpu-libdjl_torch.dylib (Library is already loaded in another ClassLoader)
	at java.base/java.lang.ClassLoader.loadLibraryWithPath(ClassLoader.java:1742)
	at java.base/java.lang.System.load(System.java:585)
	at ai.djl.pytorch.jni.LibUtils.loadLibrary(LibUtils.java:83)
	at ai.djl.pytorch.engine.PtEngine.newInstance(PtEngine.java:42)
	at ai.djl.pytorch.engine.PtEngineProvider.getEngine(PtEngineProvider.java:27)
	at ai.djl.engine.Engine.initEngine(Engine.java:49)
	at ai.djl.engine.Engine.<clinit>(Engine.java:44)
	at ai.djl.repository.zoo.BaseModelLoader.loadModel(BaseModelLoader.java:129)
	at ai.djl.repository.zoo.ModelZoo.loadModel(ModelZoo.java:162)
	at com.dataflowdeveloper.djlsentimentanalysis.SentimentAnalysisService.predict(SentimentAnalysisService.java:69)
	at com.dataflowdeveloper.djlsentimentanalysis.DeepLearningSAProcessor.onTrigger(DeepLearningSAProcessor.java:95)
	at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
	at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1176)
	at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:213)
	at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
	at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:836)
2020-09-29 10:51:29,147 WARN [Timer-Driven Process Thread-6] ai.djl.engine.Engine Failed to load engine from: ai.djl.pytorch.engine.PtEngineProvider
2020-09-29 10:51:29,147 ERROR [Timer-Driven Process Thread-6] c.d.d.DeepLearningSAProcessor DeepLearningSAProcessor[id=8e70f82b-0174-1000-396b-9febdfca9935] Unable to process Deep Learning Sentiment Analysis DL No matching model with specified Input/Output type found.
2020-09-29 10:51:29,152 ERROR [Timer-Driven Process Thread-6] c.d.d.DeepLearningSAProcessor DeepLearningSAProcessor[id=8e70f82b-0174-1000-396b-9febdfca9935] DeepLearningSAProcessor[id=8e70f82b-0174-1000-396b-9febdfca9935] failed to process due to ai.djl.repository.zoo.ModelNotFoundException: No matching model with specified Input/Output type found.; rolling back session: ai.djl.repository.zoo.ModelNotFoundException: No matching model with specified Input/Output type found.
ai.djl.repository.zoo.ModelNotFoundException: No matching model with specified Input/Output type found.
	at ai.djl.repository.zoo.ModelZoo.loadModel(ModelZoo.java:173)
	at com.dataflowdeveloper.djlsentimentanalysis.SentimentAnalysisService.predict(SentimentAnalysisService.java:69)
	at com.dataflowdeveloper.djlsentimentanalysis.DeepLearningSAProcessor.onTrigger(DeepLearningSAProcessor.java:95)
	at org.apache.nifi.processor.AbstractProcessor.onTrigger(AbstractProcessor.java:27)
	at org.apache.nifi.controller.StandardProcessorNode.onTrigger(StandardProcessorNode.java:1176)
	at org.apache.nifi.controller.tasks.ConnectableTask.invoke(ConnectableTask.java:213)
	at org.apache.nifi.controller.scheduling.TimerDrivenSchedulingAgent$1.run(TimerDrivenSchedulingAgent.java:117)
	at org.apache.nifi.engine.FlowEngine$2.run(FlowEngine.java:110)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305)
	at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
@tspannhw tspannhw added the bug Something isn't working label Sep 29, 2020
@frankfliu
Copy link
Contributor

@stu1130 Please take a look.

@stu1130
Copy link
Contributor

stu1130 commented Sep 29, 2020

Thanks for reporting the issue @tspannhw.
I was able to install your custom processor and input a file followed by two model running simultaneously and output the data to a file without seeing the error.
I am not familiar with Apache NiFi. Could you tell me where to find the error log?

@stu1130
Copy link
Contributor

stu1130 commented Sep 29, 2020

I found where the error log is but the issue I saw is

ai.djl.repository.zoo.ModelNotFoundException: No matching model with specified Input/Output type found.

@frankfliu
Copy link
Contributor

frankfliu commented Sep 30, 2020

@tspannhw You are loading two copy of Engine class into different ClassLoader. The 2nd System.load("jnipytorch") will fail. This is expected. A quick workaround would be move DJL libraries into system classloader, directly put them into top level libs folder, and configure them as "provided" dependency.

@stu1130
Copy link
Contributor

stu1130 commented Sep 30, 2020

@tspannhw we came up with three solutions.
In general, we need to move System.load() out of pytorch-engine jar to avoid loading the same native library in one JVM.

  1. We'll use java reflection to call System.load() in a separate jar file. The new jar would only contain a simple class for System.load(). We'll provide the instruction how you write the class and how you build it. Finally, you will copy your custom nar along with this jar to NiFi lib to make it work.
  2. We can package that small System.load() jar into our native-auto and provide instruction of where you can find it. Similarly, copy the nar & jar to NiFi lib.
  3. Merge the two model processors into the one, and you will implement how you use different DJL models in the processor.

We recommend option 2 as it would be the easiest solution for you. What do you think?

frankfliu added a commit to frankfliu/djl that referenced this issue Oct 21, 2020
…ent ClassLoader

Change-Id: I60ba3469cc841c2bdb2d1696f2b0926c11db1f37
frankfliu added a commit to frankfliu/djl that referenced this issue Oct 21, 2020
…ent ClassLoader

Change-Id: I60ba3469cc841c2bdb2d1696f2b0926c11db1f37
frankfliu added a commit that referenced this issue Oct 21, 2020
Change-Id: I60ba3469cc841c2bdb2d1696f2b0926c11db1f37
@frankfliu
Copy link
Contributor

frankfliu commented Oct 21, 2020

@tspannhw I created a PR to workaround NiFi ClassLoader issue.

The workaround would be:

  1. Create a class that contains a static method:
    public static void load(String path) {
        System.load(path); // NOPMD
    }
  1. Put this class in NiFi's system classpath, not in .nar file
  2. Set system properly where this class locate:
System.setProperty("ai.djl.pytorch.native_helper", "ai.djl.pytorch.integration.LibUtilsTest");

You can refer to:

public static void load(String path) {
System.load(path); // NOPMD
}

We also created a built-in NativeHelper class in ai.djl.pytorch:pytorch-native-XXX package, but you need move this file out of your .nar file and put it into nifi\libs folder.

Let me know if you have better way to resolve this issue.

@tspannhw
Copy link
Author

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants