Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

creating cache for faster access to results of previous cnmf extension calls #62

Merged
merged 34 commits into from
Jul 21, 2022

Conversation

clewis7
Copy link
Collaborator

@clewis7 clewis7 commented Jun 28, 2022

adding file to extensions for a cache

cache should allow users to quickly access results of previously made calls to cnmf extensions that have the same function_name, args, and kwargs

@kushalkolar
Copy link
Collaborator

This is a great start! I'l implement the following methods:

clear()
set_maxsize()

all I can think of for now

@kushalkolar
Copy link
Collaborator

For use in the extensions, I'd create a separate Cache instance for each extension class. Both mcorr and cnmf have a get_output func name.

@clewis7
Copy link
Collaborator Author

clewis7 commented Jun 28, 2022

will work on testing out cache for different cnmf and mcorr extension functions as well as checking reduced time in accessing!!

Copy link
Collaborator

@kushalkolar kushalkolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

almost there! :D

mesmerize_core/caiman_extensions/cache.py Outdated Show resolved Hide resolved
mesmerize_core/caiman_extensions/cache.py Outdated Show resolved Hide resolved
@kushalkolar
Copy link
Collaborator

also for some reason tests are failing for cnmfe, weird because they're passing on master and this branch has no diff with the tests on master. can check in details later.

@clewis7
Copy link
Collaborator Author

clewis7 commented Jul 3, 2022

just did basic impl of using memory size, appears to be working when I test it out

wasn't completely sure how you wanted the bounds to work because the likelihood of the number you set for bytes being equivalent to the number that the output sizes in the cache will add up to be is unrealistic

can go back and alter impl so that if adding an item will cause cache to go over cache size, will remove the most recently used item...but this could also cause many items to be removed if removing simply a path for example will not necessarily cause a huge change to size of outputs but that was the most recently accessed item in the cache

mesmerize_core/caiman_extensions/cache.py Show resolved Hide resolved
mesmerize_core/caiman_extensions/cache.py Outdated Show resolved Hide resolved
mesmerize_core/caiman_extensions/cache.py Outdated Show resolved Hide resolved
mesmerize_core/caiman_extensions/cache.py Outdated Show resolved Hide resolved
mesmerize_core/caiman_extensions/cache.py Outdated Show resolved Hide resolved
mesmerize_core/caiman_extensions/cache.py Show resolved Hide resolved
mesmerize_core/caiman_extensions/cnmf.py Show resolved Hide resolved
@kushalkolar
Copy link
Collaborator

just did basic impl of using memory size, appears to be working when I test it out

wasn't completely sure how you wanted the bounds to work because the likelihood of the number you set for bytes being equivalent to the number that the output sizes in the cache will add up to be is unrealistic

can go back and alter impl so that if adding an item will cause cache to go over cache size, will remove the most recently used item...but this could also cause many items to be removed if removing simply a path for example will not necessarily cause a huge change to size of outputs but that was the most recently accessed item in the cache

Yes you are right!

I'm assuming you mean removing the least recently used, not most recently used.

Anyways there's a very simple way on this line 118 to make sure that it never exceeds the cache size! In general there's a more elegant way to implement your control flow here.

@kushalkolar
Copy link
Collaborator

I think that the reason the tests are failing is because cached values are being returned even though new args have been passed. I'm guessing this because there's a mismatch in the shape of input arrays which corroborates with the size of ixs_components:
https:/nel-lab/mesmerize-core/runs/7168946258?check_suite_focus=true#step:7:748
https:/nel-lab/mesmerize-core/runs/7168946258?check_suite_focus=true#step:7:1058.

@clewis7
Copy link
Collaborator Author

clewis7 commented Jul 5, 2022

last commit message should be "...should now work..."

@kushalkolar
Copy link
Collaborator

I just realized, for things like get_spatial(), get_temporal() etc. we should always return a copy because otherwise the cached value would get changed if the user for example does stuff with the temporal downstream. We can have a default kwarg for all these cached functions return_copy=True. We should have the same thing for get_reconstructed() get_residuals() etc. but maybe default set as return_copy=False because these are larger arrays.

mesmerize_core/caiman_extensions/cache.py Show resolved Hide resolved
mesmerize_core/caiman_extensions/cache.py Outdated Show resolved Hide resolved
mesmerize_core/caiman_extensions/cache.py Outdated Show resolved Hide resolved
@clewis7
Copy link
Collaborator Author

clewis7 commented Jul 8, 2022

I just realized, for things like get_spatial(), get_temporal() etc. we should always return a copy because otherwise the cached value would get changed if the user for example does stuff with the temporal downstream. We can have a default kwarg for all these cached functions return_copy=True. We should have the same thing for get_reconstructed() get_residuals() etc. but maybe default set as return_copy=False because these are larger arrays.

Will implement after I fix residuals and reconstructed movie extensions

@kushalkolar
Copy link
Collaborator

ok left to do:

  1. Check that cnmf.C, A, b, f sum up to 90% the data for the entire cnmf.estimates object
  2. I just realized, for things like get_spatial(), get_temporal() etc. we should always return a copy because otherwise the cached value would get changed if the user for example does stuff with the temporal downstream. We can have a default kwarg for all these cached functions return_copy=True. We should have the same thing for get_reconstructed() get_residuals() etc. but maybe default set as return_copy=False because these are larger arrays.

Is this still an issue?

(2) I still am not sure why the size of get_temporal_components() cannot be calculated by what I have because I have verified that the output is of type numpy.ndarray, however the error that is being casted is an attribute error claiming that type list has no attribute data...makes absolutely no sense to me because the if block that should calculate the size of a numpy.ndarray is completely being skipped and then the output is being recognized as a tuple which is certainly not the case

mesmerize_core/caiman_extensions/cache.py Outdated Show resolved Hide resolved
mesmerize_core/caiman_extensions/cache.py Outdated Show resolved Hide resolved
mesmerize_core/caiman_extensions/cache.py Outdated Show resolved Hide resolved
mesmerize_core/caiman_extensions/cache.py Outdated Show resolved Hide resolved
mesmerize_core/caiman_extensions/cache.py Outdated Show resolved Hide resolved
@kushalkolar
Copy link
Collaborator

yay other than my comments about the copy, just need to write tests for cache!

  • test that the MB and GB cache size limits work, set something small for the tests like 100MB
  • test that cached vals are returned only when not present in cache dataframe
  • tests should be in a separate test function entirely, run a fresh batch of mcorr and cnmf in this function and tests that the cache stuff works here.

@kushalkolar
Copy link
Collaborator

kushalkolar commented Jul 12, 2022

So cache makes a big difference, reconstructed, residuals etc. movies are now FAST random-access scrollable. The majority of the time is spent in fetching data, the outer product doesn't take very long so we don't need to decorate get_reconstructed(), get_residuals() etc.. I tried and it actually slows it down considerably!

without_cache-2022-07-12_00.34.37.mp4
with_cache-2022-07-12_00.30.51.mp4

Copy link
Collaborator

@kushalkolar kushalkolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it'd also be useful to have cache tests to make sure that it's returning the cached vals for the correct function and item (by checking the UUID).

mesmerize_core/caiman_extensions/cache.py Show resolved Hide resolved
tests/test_core.py Outdated Show resolved Hide resolved
tests/test_core.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@kushalkolar kushalkolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Awesome job!

Copy link
Collaborator

@kushalkolar kushalkolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few minor things and I will merge! :D Test should pass on github after these changes!

mesmerize_core/caiman_extensions/mcorr.py Outdated Show resolved Hide resolved
tests/test_core.py Show resolved Hide resolved
tests/test_core.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@kushalkolar kushalkolar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yay, test pass, going to merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants