Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AnnData Manager refactor #1238

Merged
merged 17 commits into from
Nov 11, 2021
Merged

Conversation

justjhong
Copy link
Contributor

@justjhong justjhong commented Oct 21, 2021

This PR introduces the AnnDataManager class as well as accompanying AnnDataField classes. These classes aim to bring structure to how data registration implementation code is organized.

The AnnDataManager serves the role of wrapping an AnnData object and specifying how that AnnData object is setup to interface properly with a specific model class. Each AnnDataField details how a single data field is referenced in the AnnData object, and any preprocessing steps required for the field.

This PR only has the new implementation complete for the scVI model, which still has bugs in accepting the new mapping for data loading. This will be addressed in future PRs and other models will follow with the introduction of other necessary AnnDataField implementations.

@watiss
Copy link
Contributor

watiss commented Oct 26, 2021

I took a look. The high level structure LGTM

@justjhong justjhong marked this pull request as ready for review November 2, 2021 21:30
Copy link
Member

@adamgayoso adamgayoso left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initial feedback

scvi/data/anndata/_compat.py Outdated Show resolved Hide resolved
scvi/data/anndata/_fields/_base_field.py Outdated Show resolved Hide resolved
@@ -163,3 +164,24 @@ def setup_anndata(
continuous_covariate_keys=continuous_covariate_keys,
copy=copy,
)

@setup_anndata_dsp.dedent
def anndata_fields(batch_key=None, labels_key=None, layer=None):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a bit much for devs. Let's try the decorator as we discussed on slack

tests/models/test_models.py Show resolved Hide resolved
super().validate_field(adata)
x = self.get_field(adata)

if _check_nonnegative_integers(x) is False:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be an option, default is True. probably in the init of this class?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e., not all models will require data to be count

@justjhong justjhong merged commit 9a500e1 into jhong/adatarefactorbase Nov 11, 2021
@justjhong justjhong deleted the jhong/adatarefactor branch November 11, 2021 00:05
justjhong added a commit that referenced this pull request Jan 16, 2022
* move anndata file into anndata subfolder

* mock out anndata manager and fields for categorical obs fields only

* add layerfield

* fix imports

* fix imports in models

* add temporary test and fix bugs in manager code

* manager store on model class, refactor what needs to be defined for a model

* clean up some existing code

* restructure fields file, clean up more view anndata setup code, add more skeleton

* add transfer_field for obs and layer fields

* naive implementation of transfer_setup in anndata manager

* implement backwards compat with setup dict on load

* add docstrings to all new methods

* address comments: change to registry_key, change fields to set in manager

* add tests for transfer and manager from setup

* remove anndata_fields method

* address pr comment
justjhong added a commit that referenced this pull request Feb 2, 2022
* move anndata file into anndata subfolder

* mock out anndata manager and fields for categorical obs fields only

* add layerfield

* fix imports

* fix imports in models

* add temporary test and fix bugs in manager code

* manager store on model class, refactor what needs to be defined for a model

* clean up some existing code

* restructure fields file, clean up more view anndata setup code, add more skeleton

* add transfer_field for obs and layer fields

* naive implementation of transfer_setup in anndata manager

* implement backwards compat with setup dict on load

* add docstrings to all new methods

* address comments: change to registry_key, change fields to set in manager

* add tests for transfer and manager from setup

* remove anndata_fields method

* address pr comment
justjhong added a commit that referenced this pull request Feb 5, 2022
* placeholder release notes for base

* AnnData Manager refactor (#1238)

* move anndata file into anndata subfolder

* mock out anndata manager and fields for categorical obs fields only

* add layerfield

* fix imports

* fix imports in models

* add temporary test and fix bugs in manager code

* manager store on model class, refactor what needs to be defined for a model

* clean up some existing code

* restructure fields file, clean up more view anndata setup code, add more skeleton

* add transfer_field for obs and layer fields

* naive implementation of transfer_setup in anndata manager

* implement backwards compat with setup dict on load

* add docstrings to all new methods

* address comments: change to registry_key, change fields to set in manager

* add tests for transfer and manager from setup

* remove anndata_fields method

* address pr comment

* Implement NonCategoricalJointObsField (#1247)

* implement continuous obsm field

* make class specific to joint obs fields

* make fields public

* address comments

* faster concat by subsetting obs

* Implement CategoricalJointObsField (#1248)

* add categorical joint obs field

* fix _make_obsm_categorical

* add line in test

* address pr comment

* Pass manager to data loader (#1280)

* pass manager to data loader

* address comment

* Move and reorganize setup dict out of AnnData into Manager (#1285)

* new base field API for setup dict

* adapt manager to new registry

* backwards compatibility implementation

* train works

* adapt get_from_registry function

* simplify needs transfer logic, working test_scvi

* change _REGISTRY_KEYS back to _CONSTANTS

* address comments, fix compat test

* codacy

* Add anndata registration classes to developer API (#1292)

* add anndata registration classes to developer API

* move manager to anndata submodule to fix docs

* Save/Load with saved setup inputs (#1294)

* skeleton for new save load, missing field logic

* save load with saved setup inputs

* add docstring for setup_inputs

* codacy

* fix documentation

* additional pr comments

* fix manager imports

* Adapt TotalVI to new setup (#1298)

* obsm field

* adapt totalVI to new setup

* fix additional totalvi test

* make field registration ordered

* fix totalvi protein priors logic

* address pr comments

* address pr comments

* Fix tests for SCVI, TOTALVI, generic AnnData utils (#1299)

* adapt totalVI to new setup

* fix additional totalvi test

* remove run_setup_anndata kwarg, reassign uuid on view copy, fix some tests

* remove run_setup_anndata comments

* fix transfer anndata test

* fix test data format test

* fix setup anndata test

* fix most pyro tests

* fix most scarches tests

* remove state dict from field

* Adapt all internal models to new setup (#1301)

* adapt LDA

* adapt linearscvi

* remove _get_var_names_from_setup_anndata

* adapt peakvi

* adapt autozi

* adapt scanvi

* fix scanvi test

* fix totalvi test

* fix dataloader tests

* fix multiple cov tests

* adapt condscvi

* adapt destvi

* adapt multivi

* fix setup compat test

* remove get_from_registry util

* fix scanvi and peakvi scarches tests

* fix backwards compat tests and default missing summary stat in models

* address comment

* Adapt all external models to new setup (#1302)

* adapt cellassign

* adapt gimvi

* adapt solo model

* adapt stereoscope

* codacy

* rename _CONSTANTS to REGISTRY_KEYS (#1306)

* Remove deprecated functions (#1308)

* remove unused utils

* remove references to removed functions in docs

* Cleanup Manager register reference with AttrDict implementation (#1307)

* implement attrdict and add it to properties of manager

* use attrdict feature model code

* address pr comments

* codacy

* view_anndata_setup new implementation (#1313)

* implement attrdict and add it to properties of manager

* use attrdict feature model code

* address pr comments

* view anndata setup implementation

* fix bugs and tweak text

* Field specific `view_anndata_setup` (#1315)

* implement attrdict and add it to properties of manager

* use attrdict feature model code

* address pr comments

* view anndata setup implementation

* fix bugs and tweak text

* field specific view setup

* rich links in docstrings

* update get_totalvi_protein_priors code w/ master

* Fix docs (#1330)

* codacy

* address first round of comments

* comments in constants file

* add view registry tests

* fix compat function

* Update _layer_field.py

* Update _obsm_field.py

Co-authored-by: Adam Gayoso <[email protected]>
nrclaudio pushed a commit to nrclaudio/scvi-tools-tune that referenced this pull request Jun 21, 2022
* placeholder release notes for base

* AnnData Manager refactor (scverse#1238)

* move anndata file into anndata subfolder

* mock out anndata manager and fields for categorical obs fields only

* add layerfield

* fix imports

* fix imports in models

* add temporary test and fix bugs in manager code

* manager store on model class, refactor what needs to be defined for a model

* clean up some existing code

* restructure fields file, clean up more view anndata setup code, add more skeleton

* add transfer_field for obs and layer fields

* naive implementation of transfer_setup in anndata manager

* implement backwards compat with setup dict on load

* add docstrings to all new methods

* address comments: change to registry_key, change fields to set in manager

* add tests for transfer and manager from setup

* remove anndata_fields method

* address pr comment

* Implement NonCategoricalJointObsField (scverse#1247)

* implement continuous obsm field

* make class specific to joint obs fields

* make fields public

* address comments

* faster concat by subsetting obs

* Implement CategoricalJointObsField (scverse#1248)

* add categorical joint obs field

* fix _make_obsm_categorical

* add line in test

* address pr comment

* Pass manager to data loader (scverse#1280)

* pass manager to data loader

* address comment

* Move and reorganize setup dict out of AnnData into Manager (scverse#1285)

* new base field API for setup dict

* adapt manager to new registry

* backwards compatibility implementation

* train works

* adapt get_from_registry function

* simplify needs transfer logic, working test_scvi

* change _REGISTRY_KEYS back to _CONSTANTS

* address comments, fix compat test

* codacy

* Add anndata registration classes to developer API (scverse#1292)

* add anndata registration classes to developer API

* move manager to anndata submodule to fix docs

* Save/Load with saved setup inputs (scverse#1294)

* skeleton for new save load, missing field logic

* save load with saved setup inputs

* add docstring for setup_inputs

* codacy

* fix documentation

* additional pr comments

* fix manager imports

* Adapt TotalVI to new setup (scverse#1298)

* obsm field

* adapt totalVI to new setup

* fix additional totalvi test

* make field registration ordered

* fix totalvi protein priors logic

* address pr comments

* address pr comments

* Fix tests for SCVI, TOTALVI, generic AnnData utils (scverse#1299)

* adapt totalVI to new setup

* fix additional totalvi test

* remove run_setup_anndata kwarg, reassign uuid on view copy, fix some tests

* remove run_setup_anndata comments

* fix transfer anndata test

* fix test data format test

* fix setup anndata test

* fix most pyro tests

* fix most scarches tests

* remove state dict from field

* Adapt all internal models to new setup (scverse#1301)

* adapt LDA

* adapt linearscvi

* remove _get_var_names_from_setup_anndata

* adapt peakvi

* adapt autozi

* adapt scanvi

* fix scanvi test

* fix totalvi test

* fix dataloader tests

* fix multiple cov tests

* adapt condscvi

* adapt destvi

* adapt multivi

* fix setup compat test

* remove get_from_registry util

* fix scanvi and peakvi scarches tests

* fix backwards compat tests and default missing summary stat in models

* address comment

* Adapt all external models to new setup (scverse#1302)

* adapt cellassign

* adapt gimvi

* adapt solo model

* adapt stereoscope

* codacy

* rename _CONSTANTS to REGISTRY_KEYS (scverse#1306)

* Remove deprecated functions (scverse#1308)

* remove unused utils

* remove references to removed functions in docs

* Cleanup Manager register reference with AttrDict implementation (scverse#1307)

* implement attrdict and add it to properties of manager

* use attrdict feature model code

* address pr comments

* codacy

* view_anndata_setup new implementation (scverse#1313)

* implement attrdict and add it to properties of manager

* use attrdict feature model code

* address pr comments

* view anndata setup implementation

* fix bugs and tweak text

* Field specific `view_anndata_setup` (scverse#1315)

* implement attrdict and add it to properties of manager

* use attrdict feature model code

* address pr comments

* view anndata setup implementation

* fix bugs and tweak text

* field specific view setup

* rich links in docstrings

* update get_totalvi_protein_priors code w/ master

* Fix docs (scverse#1330)

* codacy

* address first round of comments

* comments in constants file

* add view registry tests

* fix compat function

* Update _layer_field.py

* Update _obsm_field.py

Co-authored-by: Adam Gayoso <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants