Add function for calculating niches #831

LLehner · 2024-05-27T16:23:44Z

Description

Adds a function that calculates niches using different strategies. The initial function calculates niches based on neighborhood profiles similar to here.

This PR will get updated with methods discussed in #789.

for more information, see https://pre-commit.ci

…into niche_definitions

…entile) counts

for more information, see https://pre-commit.ci

…into niche_definitions

for more information, see https://pre-commit.ci

… clustering steps

…into niche_definitions

for more information, see https://pre-commit.ci

…into niche_definitions

for more information, see https://pre-commit.ci

LLehner · 2024-10-10T12:45:05Z

@giovp @timtreis PR is ready for review!

Some additional questions that came up, which you could have a look at:

How should multiple slides be dealt with? All these methods should work with multiple slides, but i'm still not sure how the adjacency matrix changes when you run sq.gr.spatial_neighbors() on data with multiple slides. Is it a block diagonal matrix where each block on the diagonal is the adjacency matrix for a single slide? If yes, does it suffice to calculate the neighborhood graph once for all data or should it be done on individual slides? I noticed this can change niche results.
I think some parts could be sped up by parallelization (e.g. clustering with multiple resolutions), what would you recommend there?
If you run sq.calculate_niche() more than one time with flavor="neighborhood" i noticed that the first function call takes as much time as you would expect but for subsequent runs it's much (~10x) faster (e.g. first you run the method with one cluster resolution then call the method again with some other cluster resolutions). Its almost like the following function runs don't do neighborhood calculations (referring to sc.pp.neighbors here) anymore, but that shouldn't be the case, as nothing is cached and calculations happen on a new AnnData object for every function call. Also the "issue" persists even if you change the count matrix shape (by e.g. masking data).
Would you include some form of logging or verbosity such that user sees what the method is currently doing? Depending on the data and settings, it the function can take a while.

timtreis · 2024-10-15T07:06:28Z

Hey @LLehner

How should multiple slides be dealt with? All these methods should work with multiple slides, but i'm still not sure how the adjacency matrix changes when you run sq.gr.spatial_neighbors() on data with multiple slides. Is it a block diagonal matrix where each block on the diagonal is the adjacency matrix for a single slide? If yes, does it suffice to calculate the neighborhood graph once for all data or should it be done on individual slides? I noticed this can change niche results.

You mean a scenario where the user is just storing multiple slides in the same sdata object, like the point8, point16, ´point24` dataset we have? These should be fully independent since they have no biological connection.

I think some parts could be sped up by parallelization (e.g. clustering with multiple resolutions), what would you recommend there?

Do you have some rough numbers? We could give the user the option to define n_cores or sth and then use the parallel processing we already have, but we should definitely keep the option to run single-threaded so that it can be used inside tools like snakemake that take care of job distribution across cores. Otherwise, that'd cause errors the user cannot circumvent.

If you run sq.calculate_niche() more than one time with flavor="neighborhood" i noticed that the first function call takes as much time as you would expect but for subsequent runs it's much (~10x) faster (e.g. first you run the method with one cluster resolution then call the method again with some other cluster resolutions). Its almost like the following function runs don't do neighborhood calculations (referring to sc.pp.neighbors here) anymore, but that shouldn't be the case, as nothing is cached and calculations happen on a new AnnData object for every function call. Also the "issue" persists even if you change the count matrix shape (by e.g. masking data).

Could it be that the OS has some of the compiled bytecode or data in cache? 🤔

Would you include some form of logging or verbosity such that user sees what the method is currently doing? Depending on the data and settings, it the function can take a while.

What runtime are we speaking here about? I tend to be a fan of some intermediate output (if one can also shut it up, f.e. with some verbosity level)

Add function for calculating niches

ca6e7ff

LLehner added the graph 🕸️ label May 27, 2024

LLehner marked this pull request as draft May 27, 2024 16:24

pre-commit-ci bot and others added 4 commits May 27, 2024 16:25

[pre-commit.ci] auto fixes from pre-commit.com hooks

e08189a

for more information, see https://pre-commit.ci

Fix pre-commit

1b492ea

Fix pre-commit

032ef90

Update __init__.py

139819a

LLehner requested a review from timtreis May 27, 2024 16:43

LLehner and others added 3 commits May 27, 2024 18:54

Merge branch 'main' into niche_definitions

e0beead

Add function

a5b810e

[pre-commit.ci] auto fixes from pre-commit.com hooks

50a8474

for more information, see https://pre-commit.ci

timtreis added the squidpy2.0 Everything releated to a Squidpy 2.0 release label May 29, 2024

LLehner and others added 12 commits June 8, 2024 23:22

Update

38c67fb

Merge branch 'niche_definitions' of https:/scverse/squidpy …

2eb450c

…into niche_definitions

adding fide score and jsd metrics

86d5efd

Add function to test for niche similarity by comparing max (99th perc…

334b7fb

…entile) counts

[pre-commit.ci] auto fixes from pre-commit.com hooks

54936f9

for more information, see https://pre-commit.ci

Fix result dataframe

2c5cac8

Merge branch 'niche_definitions' of https:/scverse/squidpy …

6d74f78

…into niche_definitions

Add scores to compare different niche calculations

b5cb056

[pre-commit.ci] auto fixes from pre-commit.com hooks

2b5ef61

for more information, see https://pre-commit.ci

Update doc string and param names

c98ec1b

Update doc string and param names

9813cf5

[pre-commit.ci] auto fixes from pre-commit.com hooks

bb3bdfb

for more information, see https://pre-commit.ci

This was referenced Jun 12, 2024

identifying multi-cellular niches #607

Closed

clustering accounting for spatial coordinates #13

Open

LLehner added 4 commits June 17, 2024 15:06

Update neighborhood profile, Remove utag import

ebdf1d5

Fix pre-commit

c6d020b

Add utag inner product step

9fa0157

Fix output; Remove subsetting, neighborhood options, dimreduction and…

49b51ca

… clustering steps

LLehner and others added 25 commits October 1, 2024 22:29

Update utag

dcdd9f6

Merge branch 'main' into niche_definitions

12d918c

Update neighborhood profile based approach

6638075

Merge branch 'niche_definitions' of https:/scverse/squidpy …

7895bc7

…into niche_definitions

[pre-commit.ci] auto fixes from pre-commit.com hooks

ae3b2f8

for more information, see https://pre-commit.ci

Update doctstring

646dbb0

Fix

939de0b

Update CellCharter approach

c9b4dc5

Remove commented-out code

b1aa25f

Remove draft validation methods

72d8dc8

Merge branch 'main' into niche_definitions

4c17024

Update init; Fix mypy

a64d0fd

Merge branch 'niche_definitions' of https:/scverse/squidpy …

d682939

…into niche_definitions

Fix mypy

5bde712

[pre-commit.ci] auto fixes from pre-commit.com hooks

79fcbd0

for more information, see https://pre-commit.ci

Fix mypy

67a5133

Fix mypy

bc38737

Fix mypy

342375c

Add comments; Remove draft evaluation function

cb4e4d1

Add tests

e0a67f5

Remove unused imports and print statements

ae3b4db

Fix tests

5a4cfc4

Fix test

3e47df8

Fix test

6cbc09e

Fix test

6185c0e

LLehner marked this pull request as ready for review October 10, 2024 12:44

LLehner requested a review from giovp October 10, 2024 12:44

Merge branch 'main' into niche_definitions

0392fc1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add function for calculating niches #831

Add function for calculating niches #831

LLehner commented May 27, 2024

LLehner commented Oct 10, 2024

timtreis commented Oct 15, 2024

Add function for calculating niches #831

Are you sure you want to change the base?

Add function for calculating niches #831

Conversation

LLehner commented May 27, 2024

Description

LLehner commented Oct 10, 2024

timtreis commented Oct 15, 2024