Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove memory scope and improve memory management #695

Merged
merged 1 commit into from
Mar 9, 2021

Conversation

zachgk
Copy link
Contributor

@zachgk zachgk commented Feb 25, 2021

The MemoryScope reveals a number of shortcomings within the DJL memory
management. While the MemoryScope is deleted, many of them are fixed as part of
this PR.

First, the NDManager.{attach, detach} were renamed to xxxInternal. This is to
differentiate them from the attach and detach methods that are intended to be used.

There are two new concepts in memory management. An NDResource interface was
created to combine the concepts of managed memory that was used in NDArray and
NDList. It could also be used in more classes in the future. This includes the
getManager, attach, and detach operations.

Within the NDManager, it gains a second "management convention". The first
convention of normal resources are added to the manager and then closed when the
manager closes. This works for small numbers of things on the NDArray, but not
when operations transitively create. So, the second convention is a
tempResource. Instead of freeing them when the manager is closed, they are
returned to their original manager. This is used to create a temporary scope, do
operations within it, and then the inputs and return value are returned to the
parent while the intermediate work is cleaned. This also matches the concepts of
ownership/borrowing as well.

Using these, a few additional helper methods were created. There is
NDManager.from(resource) to ease creation of managers based on a resource.
There is also scopeManager.ret(returnValue) to help with returning values
outside of the scopeManager. Lastly, there is a scopeManager.{temp,}AttachAll
to attach a number of resources to a manager within a single call.

Using these improvements, the new method were applied to the old locations where
MemoryScope was used as well as an additional case in NDManagerEx.

Also, the old attach methods were altered to be void. Because the return
values are no longer used anywhere and are not as necessary in the current
scheme, I figured it would simplify things. It also helps for things like
NDList.attach which does not have a single original NDManager when attaching.

@stu1130
Copy link
Contributor

stu1130 commented Feb 25, 2021

I also don't like attach and detach since it confuses users with attachGradient and detachGradient

@zachgk
Copy link
Contributor Author

zachgk commented Feb 26, 2021

As in the names attach and detach? Yeah, I didn't change that in this PR (it is already long enough), but maybe in a following one. Is there a different name you want to propose? We should probably rename attachGradient too because it's not a great name either.

Also, @stu1130 are we supposed to have a detachGradient method because I don't see it

@frankfliu
Copy link
Contributor

Just replace MemoryScope with NDList should work, doesn't see the need of changing NDManager.

@stu1130
Copy link
Contributor

stu1130 commented Feb 26, 2021

detachGradient

I think we call stopGradient

@zachgk
Copy link
Contributor Author

zachgk commented Feb 26, 2021

@frankfliu I found it somewhat difficult working with the NDList. The main goal which I saw was that the NDArrays (and NDLists) had to be moved into a new list temporarily and then moved back afterwards. I guess it would be possible to do it by moving them into an NDList, recording all of the previous managers, and then reattaching back to the previous managers manually.

My issue with this is that it seems too low leveled of memory operations. The actions you are taking don't match up very well with intent and it is easy to forget to move all of the NDArrays back. This could be the source for non-stop bugs in the future the same way as segfaults in C.

The second issue lies in the number of items. In the NDManager with tempAttach, there is 1 variable to manage: the new scope. And with a try-with-resources, it closes automatically which is most of the problem. Using pure NDLists, you have three: new scope, NDList of variables to return, and the list of managers to return them to. This increases linearly with the number of new scopes as new variable names users have to create and manage.

@@ -206,7 +209,17 @@ public synchronized void attach(String resourceId, AutoCloseable resource) {

/** {@inheritDoc} */
@Override
public synchronized void detach(String resourceId) {
public void tempAttachInternal(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what makes difference on tempAttachInternel comparing to the method attachInternal?

@lanking520
Copy link
Contributor

@zachgk please do a poll on naming

@codecov-io
Copy link

Codecov Report

Merging #695 (5ab673e) into master (48cf663) will increase coverage by 0.10%.
The diff coverage is 26.50%.

Impacted file tree graph

@@             Coverage Diff              @@
##             master     #695      +/-   ##
============================================
+ Coverage     68.95%   69.06%   +0.10%     
- Complexity     3966     3976      +10     
============================================
  Files           462      461       -1     
  Lines         18634    18632       -2     
  Branches       1998     1996       -2     
============================================
+ Hits          12850    12868      +18     
+ Misses         4753     4748       -5     
+ Partials       1031     1016      -15     
Impacted Files Coverage Δ Complexity Δ
api/src/main/java/ai/djl/ndarray/NDArray.java 74.25% <ø> (ø) 89.00 <0.00> (ø)
...i/src/main/java/ai/djl/ndarray/NDArrayAdapter.java 0.00% <ø> (ø) 0.00 <0.00> (ø)
...src/main/java/ai/djl/nn/transformer/BertBlock.java 46.20% <0.00%> (+0.29%) 8.00 <0.00> (ø)
...l/nn/transformer/BertMaskedLanguageModelBlock.java 35.48% <0.00%> (-1.19%) 2.00 <0.00> (ø)
...jl/nn/transformer/BertMaskedLanguageModelLoss.java 17.85% <0.00%> (ø) 1.00 <0.00> (ø)
...va/ai/djl/nn/transformer/BertNextSentenceLoss.java 18.18% <0.00%> (ø) 1.00 <0.00> (ø)
...va/ai/djl/nn/transformer/BertPretrainingBlock.java 42.50% <0.00%> (+1.03%) 2.00 <0.00> (ø)
...ain/java/ai/djl/onnxruntime/engine/OrtNDArray.java 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
...n/java/ai/djl/onnxruntime/engine/OrtNDManager.java 0.00% <0.00%> (ø) 0.00 <0.00> (ø)
...ain/java/ai/djl/paddlepaddle/engine/PpNDArray.java 35.55% <12.50%> (-3.47%) 8.00 <0.00> (ø)
... and 24 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 48cf663...5ab673e. Read the comment docs.

@zachgk zachgk force-pushed the noMemoryScope branch 2 times, most recently from c4fe0da to ab4b58d Compare March 9, 2021 00:32
The MemoryScope reveals a number of shortcomings within the DJL memory
management. While the MemoryScope is deleted, many of them are fixed as part of
this PR.

First, the NDManager.{attach, detach} were renamed to xxxInternal. This is to
differentiate them from the attach and detach methods that are intended to be used.

There are two new concepts in memory management. An NDResource interface was
created to combine the concepts of managed memory that was used in NDArray and
NDList. It could also be used in more classes in the future. This includes the
getManager, attach, and detach.

Within the NDManager, it gains a second "management convention". The first
convention of normal resources are added to the manager and then closed when the
manager closes. This works for small numbers of things on the NDArray, but not
when operations transitively create. So, the second convention is a
tempResource. Instead of freeing them when the manager is closed, they are
returned to their original manager. This is used to create a temporary scope, do
operations within it, and then the inputs and return value are returned to the
parent while the intermediate work is cleaned. This also matches the concepts of
ownership/borrowing as well.

Using these, a few additional helper methods were created. There is
`NDManager.from(resource)` to ease creation of managers based on a resource.
There is also `scopeManager.ret(returnValue)` to help with returning values
outside of the scopeManager. Lastly, there is a `scopeManager.{temp,}AttachAll`
to attach a number of resources to a manager within a single call.

Using these improvements, the new method were applied to the old locations where
MemoryScope was used as well as an additional case in NDManagerEx.

Also, the old attach methods were altered to be `void`. Because the return
values are no longer used anywhere and are not as necessary in the current
scheme, I figured it would simplify things. It also helps for things like
`NDList.attach` which does not have a single original NDManager when attaching.

Change-Id: I91d109cd14d70fa64fd8fffa0b50d88ab053013e
@lanking520 lanking520 merged commit 8342d44 into deepjavalibrary:master Mar 9, 2021
@zachgk zachgk deleted the noMemoryScope branch March 9, 2021 19:41
zachgk added a commit that referenced this pull request Mar 11, 2021
* This creates the component which will populate the Download Tab with Download Buttons.

* Making a place for the download buttons.

* Adding the Model Download Handler allowing the backend to feed the links into the Model View and making slight changes for readablity.

* Getting rid of some of the test code.

* Improve Block usability (#712)

* Use builder pattern for Parameter (#661)

* Make XavierInitializer default value & Improve setInitializer (#664)

* Refactor initialize (#675)

* Remove NDManager on getOutputShapes (#710)

* Removing unnecessary logging messages.

* block factory init commit (#697)

* [DOCS] Fixing TrainingListener documentation (#718)

* Fixing TrainingListener documentation

* Fixing PR reviews

* Fix DJL serving flaky test for mac (#721)

Change-Id: I9eccc84b0c34652e50c5fe5a4fe42f2b82d65a3d

* Fixing all of the nits.

* Getting rid of unnecessary methods.

* update onnxruntime along with String tensor (#724)

* Add profiler doc (#722)

* Resolving some comments.

* Using a better criteria incase multiple models have the same name.

* Fixing the java doc.

* Configure verbose of mxnet extra libraries (#728)

Change-Id: I66d54aa496cccbb9e8c0a89eeaa458605958d9c6

* Added a TODO for using the artifact repo to get the base uri.

* paddlepaddle CN notebook (#730)

* paddlepaddle CN notebook

* install font

Change-Id: I2d749e617b0bf78ecbcd168b82c53a1fab49a2c0

* refactor on name

Change-Id: I9e379eee51ceae16391850b3ba9782acb04c4021

* Refine the text

Co-authored-by: gstu1130 <[email protected]>

* add EI documentation (#733)

* add EI documentation

* fix pmd rules

Change-Id: Ieee5577c26f6df2843781f8f9180de35069a5de3

* allow pytorch stream model loading (#729)

* allow pytorch stream model loading

* updates

Change-Id: Ibc26261b90de673712e90de0d640a8f32f23763e

* add NDList decode from inputStream (#734)

Change-Id: I6a31d8b0b955f2dbb762220b101e3928a34699c1

* Remove memory scope and improve memory management (#695)

The MemoryScope reveals a number of shortcomings within the DJL memory
management. While the MemoryScope is deleted, many of them are fixed as part of
this PR.

First, the NDManager.{attach, detach} were renamed to xxxInternal. This is to
differentiate them from the attach and detach methods that are intended to be used.

There are two new concepts in memory management. An NDResource interface was
created to combine the concepts of managed memory that was used in NDArray and
NDList. It could also be used in more classes in the future. This includes the
getManager, attach, and detach.

Within the NDManager, it gains a second "management convention". The first
convention of normal resources are added to the manager and then closed when the
manager closes. This works for small numbers of things on the NDArray, but not
when operations transitively create. So, the second convention is a
tempResource. Instead of freeing them when the manager is closed, they are
returned to their original manager. This is used to create a temporary scope, do
operations within it, and then the inputs and return value are returned to the
parent while the intermediate work is cleaned. This also matches the concepts of
ownership/borrowing as well.

Using these, a few additional helper methods were created. There is
`NDManager.from(resource)` to ease creation of managers based on a resource.
There is also `scopeManager.ret(returnValue)` to help with returning values
outside of the scopeManager. Lastly, there is a `scopeManager.{temp,}AttachAll`
to attach a number of resources to a manager within a single call.

Using these improvements, the new method were applied to the old locations where
MemoryScope was used as well as an additional case in NDManagerEx.

Also, the old attach methods were altered to be `void`. Because the return
values are no longer used anywhere and are not as necessary in the current
scheme, I figured it would simplify things. It also helps for things like
`NDList.attach` which does not have a single original NDManager when attaching.

Change-Id: I91d109cd14d70fa64fd8fffa0b50d88ab053013e

* Remove erroneous random forest application (#726)

The application was changed to the more accurate softmax_regression (matching
the terminology from the D2L book).

Change-Id: I1f69f005bbe38b125f2709c2988d06c14eebb765

* Minor fixes on duplicated code (#736)

* remove methods that already defined in the NDArrayAdapter

Change-Id: I01cc03a7f5b427bf31c6b3fd8d2136f2a27fe93b

* refactor toString

Change-Id: Iea22b16e1daa9f759b55c1a8b8b85536482e551a

* remove sparse NDArray

Change-Id: Icb44096519775f54cb32cc768c14f49e33dc7ea5

* fix test

Change-Id: Icef580ed77e7bba22864ce44577de3cba51e3e41

Co-authored-by: Jake Lee <[email protected]>
Co-authored-by: Lanking <[email protected]>
Co-authored-by: aksrajvanshi <[email protected]>
Co-authored-by: Frank Liu <[email protected]>
Co-authored-by: Zach Kimberg <[email protected]>
Lokiiiiii pushed a commit to Lokiiiiii/djl that referenced this pull request Oct 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants