Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add importance_sample method to NestedSamples and MCMCSamples #122

Merged
merged 12 commits into from
Aug 20, 2020

Conversation

lukashergt
Copy link
Collaborator

@lukashergt lukashergt commented Aug 20, 2020

As suggested in #120, this PR implements a method for importance re-weighting the nested samples, which allows e.g. discarding/penalising samples as done in #120, but also allows adding to logL, e.g. with data from a different experiment.

Some things to keep in mind:

  • We should note that this is a distinct advantage of doing it this way, rather than manually computing log volumes, since we get an error bar.

  • In general, reweighting doesn't solve everything, as if there is a big difference between the importance sampled likelihood and the actual one you sampled over in the first place then you won't have enough sampling coverage to get decent posteriors and evidences, but it would definitely be worth doing in a similar context to the planck parameters table, where the reweighting is more minor.

Checklist:

  • I have performed a self-review of my own code
  • My code is PEP8 compliant (flake8 anesthetic tests)
  • My code contains compliant docstrings (pydocstyle --convention=numpy anesthetic)
  • New and existing unit tests pass locally with my changes (python -m pytest)
  • I have added tests that prove my fix is effective or that my feature works

anesthetic/samples.py Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Aug 20, 2020

Codecov Report

Merging #122 into master will increase coverage by 0.07%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #122      +/-   ##
==========================================
+ Coverage   92.72%   92.80%   +0.07%     
==========================================
  Files          16       16              
  Lines        1458     1473      +15     
==========================================
+ Hits         1352     1367      +15     
  Misses        106      106              
Impacted Files Coverage Δ
anesthetic/samples.py 100.00% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9b81fb9...955776a. Read the comment docs.

anesthetic/samples.py Outdated Show resolved Hide resolved
anesthetic/samples.py Outdated Show resolved Hide resolved
anesthetic/samples.py Show resolved Hide resolved
@williamjameshandley
Copy link
Collaborator

postscript -- mine and Andrew's comments overlap (apologies, I was mid-review and did not see andrew).

anesthetic/samples.py Show resolved Hide resolved
tests/test_samples.py Show resolved Hide resolved
tests/test_samples.py Outdated Show resolved Hide resolved
anesthetic/samples.py Outdated Show resolved Hide resolved
@williamjameshandley
Copy link
Collaborator

@pstoecker, given your work on gambit related importance weighting, you will likely be interested in this PR when it is merged later today, as well as the conversation in #120.

@williamjameshandley williamjameshandley changed the title add importance_reweighting method to NestedSamples add importance_sample method to NestedSamples Aug 20, 2020
Copy link
Collaborator

@williamjameshandley williamjameshandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to be a pain in requesting another change, but something I just realised when adjusting the title is that a key thing that this PR is missing is an equivalent method for MCMCSamples, which would just reweight the logL column. NestedSamples should then super() call the reweighting portion, and then recompute the nlive with merge_nested_samples

@williamjameshandley williamjameshandley changed the title add importance_sample method to NestedSamples add importance_sample method to NestedSamples and MCMCSamples Aug 20, 2020
Copy link
Collaborator

@williamjameshandley williamjameshandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Excellent work.

One further optional addition is to add an inplace=False optional kwarg to importance_sample, in the same vein as NestedSamples.set_beta, and generally quite common in pandas for this kind of large-scale operation.. This would be implemented simply as

def importance_sample(self, logL_new, action='add', inplace=False):
....
     if inplace:
        self = samples
    else:
        return samples

@lukashergt
Copy link
Collaborator Author

One further optional addition is to add an inplace=False optional kwarg to importance_sample, in the same vein as NestedSamples.set_beta, and generally quite common in pandas for this kind of large-scale operation.. This would be implemented simply as

def importance_sample(self, logL_new, action='add', inplace=False):
....
     if inplace:
        self = samples
    else:
        return samples

The situation here is a tiny bit more complicated than for set_beta, as self might (or probably will) change shape here. I gave it a try and it didn't work as easily as anticipated and now I'm wondering whether this is really needed?

Typical situations where inplace is used is where the structure of the object stays intact and only some elements are modified, e.g. to get a perfomance gain. This is not the case, here, so I'd vote for leaving this option out.

@lukashergt
Copy link
Collaborator Author

Sorry to be a pain in requesting another change, but something I just realised when adjusting the title is that a key thing that this PR is missing is an equivalent method for MCMCSamples, which would just reweight the logL column. NestedSamples should then super() call the reweighting portion, and then recompute the nlive with merge_nested_samples

This thing made me wish for docstring inheritance (#22, #24) again... In the current version the docstrings for the MCMCSamples and the NestedSamples method are almost identical...

@lukashergt
Copy link
Collaborator Author

Many thanks to @andrewfowlie and @williamjameshandley for your helpful input!

@lukashergt lukashergt merged commit 3db7903 into handley-lab:master Aug 20, 2020
@williamjameshandley
Copy link
Collaborator

This thing made me wish for docstring inheritance (#22, #24) again... In the current version the docstrings for the MCMCSamples and the NestedSamples method are almost identical...

I agree that the current setup requires a lot of work (which still results in inconsistencies). I am wary of any method however which results in non-readable strings in source code. One of the the things I really like about python that the documentation and source are tightly linked in a text file by convention.

It would be nice if there were a way of automatically checking docstring consistency via a script (which was fed/interpreted the logical structure), but I'm not sure if such a thing exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants