Add gradient-boosted equivalent sources #275

santisoler · 2021-10-27T20:47:40Z

Add a new EquivalentSourcesGB class that implements gradient-boosting
equivalent sources. It's a subclass of EquivalentSources, takes new
window_size and random_state arguments and has a modified fit method that
applies the gradient-boosting algorithm. The amount of overlapping is fixed to
50%. The windows are randomly shuffled by default. Add new test functions for
the new class and its features. Add a gallery example for gradient-boosted
equivalent sources.

Fixes #252

Reminders:

Run make format and make check to make sure the code follows the style guide.
Add tests for new features or tests that would have caught the bug that you're fixing.
Add new public functions/methods/classes to doc/api/index.rst and the base __init__.py file for the package.
Write detailed docstrings for all functions/classes/methods. It often helps to design better code if you write the docstrings first.
If adding new functionality, add an example to the docstring, gallery, and/or tutorials.
Add your full name, affiliation, and ORCID (optional) to the AUTHORS.md file (if you haven't already) in case you'd like to be listed as an author on the Zenodo archive of the next release.

Rename the source file to gradient_boosted.py. Enable the block_size argument after the last merge.

The residue array is initialized within the private function, there's no need to do it outside of it since we won't support a warm run.

Add an option to the _create_windows method to disable shuffling. This is intended to be used only for testing purposes, not IRL.

There's no need this method cannot be a separate private function.

There's no need to prevent overwriting the Jacobian matrix after the fitting process since the update of the residuals are done through a sum because we need to compute the predictions on every observation point, not only over those inside the window.

Use the South Africa gravity dataset.

Ditch the assert_mse function from tests/utils.py. Replace the large region test function for a one that compares the predictions against the data and a denser grid, but on the same small region than other tests are using. The comparison is carried out with a npt.assert_allclose but setting atol instead of rtol.

santisoler · 2021-11-09T17:48:18Z

harmonica/equivalent_sources/gradient_boosted.py

+ self.coefs_[point_window] += coeffs_chunk
+ self.errors_ = np.array(errors)
+
+ def _create_windows(self, coordinates, shuffle_windows=True):


I've added the shuffle_windows argument to this function in order to run a test that depends on not shuffling (the one that checks that the correspondence between sources windows and data windows). I'm not planning to make it public since it's a bad idea not to shuffle.

Makes perfect sense and it says so in the docstring which is perfect. How about just shuffle since the function is already _create_windows?

I used shuffle at first, but the linter warned me that shuffle was taken by the import of the sklearn.utils.shuffle function. I agree with renaming the argument, so I would need to import the function with an alias.

Better than that would be import the module instead from … import. That’s always preferred to importing single functions since it keeps the namespaces separate.

harmonica/tests/test_gradient_boosted_eqs.py

leouieda

This is great! Can't wait to use it 🙂 Left a few suggestions but feel free to reject them if your disagree. This is good to merge 👍

examples/equivalent_sources/gradient_boosted.py

harmonica/equivalent_sources/gradient_boosted.py

leouieda · 2021-11-13T14:16:21Z

harmonica/equivalent_sources/gradient_boosted.py

+ self.coefs_[point_window] += coeffs_chunk
+ self.errors_ = np.array(errors)
+
+ def _create_windows(self, coordinates, shuffle_windows=True):


Makes perfect sense and it says so in the docstring which is perfect. How about just shuffle since the function is already _create_windows?

harmonica/tests/test_gradient_boosted_eqs.py

Co-authored-by: Leonardo Uieda <[email protected]>

By making it a regular method we avoid the need to pass the same arguments twice.

Use an alias for the sklearn.utils.shuffle function.

doc/api/index.rst

santisoler

Replace "magnetic" for "gravity" in the example

santisoler added 11 commits October 27, 2021 12:15

Start drafting gradient-boosted equivalent sources

dba50d8

Merge branch 'master' into gradient-boosted-eqs

e4eb43a

Add EquivalentSourcesGB to __index__.py and API Index

970d779

Rename the source file to gradient_boosted.py. Enable the block_size argument after the last merge.

Replace residue argument for data in _gradient_boosting

a899adf

The residue array is initialized within the private function, there's no need to do it outside of it since we won't support a warm run.

Add option to disable window shuffling on private method

f8239b6

Add an option to the _create_windows method to disable shuffling. This is intended to be used only for testing purposes, not IRL.

Redefine the _get_region_data_sources method to a function

7b3bc24

There's no need this method cannot be a separate private function.

Define a new assert_mse function for testing purposes

805100d

Start writing tests for GB equivalent sources

681be37

Minor changes to comments

c87c729

Don't copy Jacobian matrix

90b2325

There's no need to prevent overwriting the Jacobian matrix after the fitting process since the update of the residuals are done through a sum because we need to compute the predictions on every observation point, not only over those inside the window.

Pass block_size argument and fix small typo

0aa6f63

santisoler added the enhancement Idea or request for a new feature label Nov 2, 2021

santisoler added 12 commits November 3, 2021 16:35

Add example for gradient-boosted equivalent sources

6adcb3f

Use the South Africa gravity dataset.

Fix links on the docstring of the example

d61309a

Remove the errors curve plot

ed16630

Populate docstrings

9f9543d

Fix error on docstring

495fb8f

Add test function for custom sources

ad717d3

Add estimate_required_memory class method

6d9057c

Fix pylint errors on test_gradient_boosted_eqs.py

02bd76c

Fix pylint errors on gradient_boosted.py

38f9780

Fix typo on pylint disable comments

d3a5357

Improve gradient-boosted example

0f98adc

Disable pylint protected-access in the entire test file

9f324e0

santisoler changed the title ~~WIP Add gradient-boosted equivalent sources~~ Add gradient-boosted equivalent sources Nov 8, 2021

Merge branch 'master' into gradient-boosted-eqs

600b2bd

leouieda self-requested a review November 9, 2021 16:20

santisoler commented Nov 9, 2021

View reviewed changes

harmonica/tests/test_gradient_boosted_eqs.py Outdated Show resolved Hide resolved

santisoler added 4 commits November 10, 2021 12:26

Change atol as proportional to max data value

3a974c4

Increase atol to fix failures in Windows

191592d

Simplify tests through fixtures

a49c5e1

Remove region argument from where wasn't being used

eaf9f60

leouieda reviewed Nov 13, 2021

View reviewed changes

santisoler and others added 9 commits November 15, 2021 09:59

Replace "simple" for "small" in example introductory text

3b2823e

Co-authored-by: Leonardo Uieda <[email protected]>

Merge branch 'master' into gradient-boosted-eqs

98b4b11

Add dtype argument to EquivalentSourcesGB

b910980

Remove unused arguments in test function

99b8972

Make estimate_required_memory a regular method

01fcbaa

By making it a regular method we avoid the need to pass the same arguments twice.

Rename shuffle_windows argument to shuffle

bfdccbf

Use an alias for the sklearn.utils.shuffle function.

Rename errors_ attr to rmse_per_iteration_

2e2e22d

Comment test lines that uses atol and set rtol to 0

3f28c96

Merge branch 'master' into gradient-boosted-eqs

a68ded5

santisoler commented Nov 15, 2021

View reviewed changes

doc/api/index.rst Outdated Show resolved Hide resolved

santisoler added 2 commits November 15, 2021 11:56

Change order of EquivalentSourcesGB in doc/api/index.rst

04744a5

Import the entire sklearn utils module for the shuffle function

39eee1b

santisoler commented Nov 16, 2021

View reviewed changes

Replace magnetic anomaly for gravity disturbance in example

433d346

santisoler merged commit f238fc3 into master Nov 16, 2021

santisoler deleted the gradient-boosted-eqs branch November 16, 2021 17:20

santisoler mentioned this pull request Nov 30, 2021

Implement gradient-boosted equivalent sources #252

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add gradient-boosted equivalent sources #275

Add gradient-boosted equivalent sources #275

santisoler commented Oct 27, 2021 •

edited

Loading

santisoler Nov 9, 2021

leouieda Nov 13, 2021

santisoler Nov 15, 2021

leouieda Nov 16, 2021

leouieda left a comment

leouieda Nov 13, 2021

santisoler left a comment

Add gradient-boosted equivalent sources #275

Add gradient-boosted equivalent sources #275

Conversation

santisoler commented Oct 27, 2021 • edited Loading

santisoler Nov 9, 2021

Choose a reason for hiding this comment

leouieda Nov 13, 2021

Choose a reason for hiding this comment

santisoler Nov 15, 2021

Choose a reason for hiding this comment

leouieda Nov 16, 2021

Choose a reason for hiding this comment

leouieda left a comment

Choose a reason for hiding this comment

leouieda Nov 13, 2021

Choose a reason for hiding this comment

santisoler left a comment

Choose a reason for hiding this comment

santisoler commented Oct 27, 2021 •

edited

Loading