Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add gradient-boosted equivalent sources #275

Merged
merged 41 commits into from
Nov 16, 2021
Merged

Conversation

santisoler
Copy link
Member

@santisoler santisoler commented Oct 27, 2021

Add a new EquivalentSourcesGB class that implements gradient-boosting
equivalent sources. It's a subclass of EquivalentSources, takes new
window_size and random_state arguments and has a modified fit method that
applies the gradient-boosting algorithm. The amount of overlapping is fixed to
50%. The windows are randomly shuffled by default. Add new test functions for
the new class and its features. Add a gallery example for gradient-boosted
equivalent sources.

Fixes #252

Reminders:

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst and the base __init__.py file for the package.
  • Write detailed docstrings for all functions/classes/methods. It often helps to design better code if you write the docstrings first.
  • If adding new functionality, add an example to the docstring, gallery, and/or tutorials.
  • Add your full name, affiliation, and ORCID (optional) to the AUTHORS.md file (if you haven't already) in case you'd like to be listed as an author on the Zenodo archive of the next release.

Rename the source file to gradient_boosted.py.
Enable the block_size argument after the last merge.
The residue array is initialized within the private function, there's no
need to do it outside of it since we won't support a warm run.
Add an option to the _create_windows method to disable shuffling.
This is intended to be used only for testing purposes, not IRL.
There's no need this method cannot be a separate private function.
There's no need to prevent overwriting the Jacobian matrix after the
fitting process since the update of the residuals are done through a sum
because we need to compute the predictions on every observation point,
not only over those inside the window.
@santisoler santisoler added the enhancement Idea or request for a new feature label Nov 2, 2021
@santisoler santisoler changed the title WIP Add gradient-boosted equivalent sources Add gradient-boosted equivalent sources Nov 8, 2021
@leouieda leouieda self-requested a review November 9, 2021 16:20
Ditch the assert_mse function from tests/utils.py.
Replace the large region test function for a one that compares the
predictions against the data and a denser grid, but on the same small
region than other tests are using.
The comparison is carried out with a npt.assert_allclose but setting
atol instead of rtol.
self.coefs_[point_window] += coeffs_chunk
self.errors_ = np.array(errors)

def _create_windows(self, coordinates, shuffle_windows=True):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added the shuffle_windows argument to this function in order to run a test that depends on not shuffling (the one that checks that the correspondence between sources windows and data windows). I'm not planning to make it public since it's a bad idea not to shuffle.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes perfect sense and it says so in the docstring which is perfect. How about just shuffle since the function is already _create_windows?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used shuffle at first, but the linter warned me that shuffle was taken by the import of the sklearn.utils.shuffle function. I agree with renaming the argument, so I would need to import the function with an alias.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better than that would be import the module instead from … import. That’s always preferred to importing single functions since it keeps the namespaces separate.

Copy link
Member

@leouieda leouieda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Can't wait to use it 🙂 Left a few suggestions but feel free to reject them if your disagree. This is good to merge 👍

examples/equivalent_sources/gradient_boosted.py Outdated Show resolved Hide resolved
examples/equivalent_sources/gradient_boosted.py Outdated Show resolved Hide resolved
harmonica/equivalent_sources/gradient_boosted.py Outdated Show resolved Hide resolved
self.coefs_[point_window] += coeffs_chunk
self.errors_ = np.array(errors)

def _create_windows(self, coordinates, shuffle_windows=True):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes perfect sense and it says so in the docstring which is perfect. How about just shuffle since the function is already _create_windows?

harmonica/tests/test_gradient_boosted_eqs.py Outdated Show resolved Hide resolved
doc/api/index.rst Outdated Show resolved Hide resolved
Copy link
Member Author

@santisoler santisoler left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace "magnetic" for "gravity" in the example

@santisoler santisoler merged commit f238fc3 into master Nov 16, 2021
@santisoler santisoler deleted the gradient-boosted-eqs branch November 16, 2021 17:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Idea or request for a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement gradient-boosted equivalent sources
2 participants