Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add dtype argument to EquivalentSources #278

Merged
merged 8 commits into from
Nov 15, 2021
Merged

Add dtype argument to EquivalentSources #278

merged 8 commits into from
Nov 15, 2021

Conversation

santisoler
Copy link
Member

@santisoler santisoler commented Nov 10, 2021

The dtype argument is used to cast the inputs of the fit method, to build
the location of the point sources, to allocate the Jacobian matrix and
therefore to produce predictions. Add a new utility function for casting the
inputs of the fit method to the desired dtype. Add new test file for the
equivalent sources utility functions. Add tests for the new feature. Simplify
test suite for equivalent sources through fixtures.

Reminders:

  • Run make format and make check to make sure the code follows the style guide.
  • Add tests for new features or tests that would have caught the bug that you're fixing.
  • Add new public functions/methods/classes to doc/api/index.rst and the base __init__.py file for the package.
  • Write detailed docstrings for all functions/classes/methods. It often helps to design better code if you write the docstrings first.
  • If adding new functionality, add an example to the docstring, gallery, and/or tutorials.
  • Add your full name, affiliation, and ORCID (optional) to the AUTHORS.md file (if you haven't already) in case you'd like to be listed as an author on the Zenodo archive of the next release.

The dtype argument is used to cast the inputs of the fit method, to
build the location of the point sources, to allocate the Jacobian matrix
and therefore to produce predictions.
Add a new utility function for casting the inputs of the fit method to
the desired dtype.
Add new test file for the equivalent sources utility functions.
Add tests for the new feature.
Also removes the comment to ignore pylint warning that doesn't apply
anymore.
@santisoler santisoler added the enhancement Idea or request for a new feature label Nov 10, 2021
Comment on lines +448 to +449
# Check the data type of the source coefficients
# assert eqs.coefs_.dtype == np.dtype(dtype)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leouieda I've noticed that the coefs_ are casted to float64 even if the inputs to vdb.least_squares are all float32. Even that, the predictions are created as float32.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to track this down. It seems that scikit-learn uses the dtype of the Jacobian for everything. But the coeffs are actually coming from scipy.linalg.solve according to this line. The docs for solve don't mention dtype so it could be that they always return a float64 from the underlying LAPACK functions.

It would be fine to leave the coefs the way they are returned since they usually don't require much storage. It's mostly the Jacobian that's affected.

@leouieda
Copy link
Member

Any thoughts on defaulting to float32?

@santisoler
Copy link
Member Author

Any thoughts on defaulting to float32?

I ran this little script to compare the accuracy of float32 results:

import verde as vd
import harmonica as hm
import matplotlib.pyplot as plt

region = (-3e3, -1e3, 5e3, 7e3)
shape = (40, 40)
coordinates = vd.grid_coordinates(region=region, shape=shape, extra_coords=0)
points = vd.grid_coordinates(region=region, shape=(6, 6), extra_coords=-1e3)
masses = vd.datasets.CheckerBoard(amplitude=1e13, region=region).predict(points)
data = hm.point_mass_gravity(coordinates, points, masses, field="g_z")

eqs64 = hm.EquivalentSources(dtype="float64")
eqs64.fit(coordinates, data)
prediction_64 = eqs64.predict(coordinates)

eqs32 = hm.EquivalentSources(dtype="float32")
eqs32.fit(coordinates, data)
prediction_32 = eqs32.predict(coordinates)


fig, ((ax1, ax2), (ax3, ax4)) = plt.subplots(nrows=2, ncols=2, sharex=True, sharey=True)

maxabs = vd.maxabs(prediction_32, prediction_64)

pc = ax1.pcolormesh(
    *coordinates[:2], prediction_32, vmin=-maxabs, vmax=maxabs, cmap="seismic"
)
plt.colorbar(pc, ax=ax1)
ax1.set_title("Prediction 32bits")
ax1.set_aspect("equal")

pc = ax2.pcolormesh(
    *coordinates[:2], prediction_64, vmin=-maxabs, vmax=maxabs, cmap="seismic"
)
plt.colorbar(pc, ax=ax2)
ax2.set_title("Prediction 64bits")
ax2.set_aspect("equal")

diff = data - prediction_32
maxabs = vd.maxabs(diff)

pc = ax3.pcolormesh(*coordinates[:2], diff, vmin=-maxabs, vmax=maxabs, cmap="seismic")
plt.colorbar(pc, ax=ax3)
ax3.set_title("Diff 32bits")
ax3.set_aspect("equal")

diff = data - prediction_64
maxabs = vd.maxabs(diff)

pc = ax4.pcolormesh(*coordinates[:2], diff, vmin=-maxabs, vmax=maxabs, cmap="seismic")
plt.colorbar(pc, ax=ax4)
ax4.set_title("Diff 64bits")
ax4.set_aspect("equal")

plt.show()

With the following result:

float64_vs_float32

The errors of the float32 are 5 orders of magnitude higher than using float64. In the light of these results, I would say to keep float64 by default.

In case the dataset is large enough so it won't fit in memory, but not soooo large, would be interesting to see a comparison between using float32 and gradient-boosted equivalent sources. I suspect float32 would give better results.

@leouieda
Copy link
Member

👍 sounds good to me! We can add a tutorial about this at some point. Merge away!

@santisoler santisoler merged commit 24e7ef4 into master Nov 15, 2021
@santisoler santisoler deleted the eqs-dtype branch November 15, 2021 12:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Idea or request for a new feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants