Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

samples.mean(), samples.std() etc. are not correctly ignoring nan values #392

Open
lukashergt opened this issue Aug 16, 2024 · 0 comments
Open
Assignees
Labels
bug Something isn't working

Comments

@lukashergt
Copy link
Collaborator

Describe the bug
Simple summary statistics such as samples.mean() or samples.std() are not correctly computed when the data contains nan despite the skipna=True flag.

My suspicion is that this is happening when the data contains nan values, but the weights do not (e.g. when computing derived parameters that don't work for part of the parameter space).

To Reproduce

samples = read_chains("./tests/example_data/cb").remove_burn_in(burn_in=0.1)
samples['y'] = np.where(samples.index.get_level_values(0)<=2000, samples.x0 + samples.x1, np.nan)
print(samples.y.mean())
print(samples.y.mean(skipna=True))
print(np.ma.average(a=np.ma.array(samples.y, mask=np.isnan(samples.y)), weights=samples.get_weights()))
print(samples.loc[:2000].y.mean())
assert samples.y.mean() == samples.loc[:2000].y.mean()

Output:

-0.009548434286901122
-0.009548434286901122
-0.01623565975733643
-0.01623565975733643
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[29], line 7
      5 print(np.ma.average(a=np.ma.array(samples.y, mask=np.isnan(samples.y)), weights=samples.get_weights()))
      6 print(samples.loc[:2000].y.mean())
----> 7 assert samples.y.mean() == samples.loc[:2000].y.mean()

AssertionError:

Expected behavior
Any nan values should be completely ignored in the computation.

Additional context
This will also affect computation of things like the Gelman--Rubin statistic.

@lukashergt lukashergt added the bug Something isn't working label Aug 16, 2024
@lukashergt lukashergt self-assigned this Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant