Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

don't use issubdtype to check for integer dtypes in polyval #7619

Merged
merged 9 commits into from
Mar 22, 2023

Conversation

Ravenin7
Copy link
Contributor

@Ravenin7 Ravenin7 commented Mar 13, 2023

Why

While running the "make" command for building the documentation, I had the following error :
(I am on Windows 11 and using Anaconda Prompt as my CLI)

WARNING: ources... [ 98%] user-guide/computation
>>>-------------------------------------------------------------------------
Exception in C:\Users\91942\xarray\doc\user-guide\computation.rst at block ending on line 466
Specify :okexcept: as an option in the ipython:: block to suppress this message
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[83], line 1
----> 1 xr.polyval(coord=x, coeffs=out.polyfit_coefficients)

File ~\anaconda3\envs\xarray-docs\lib\site-packages\xarray\core\computation.py:1958, in polyval(coord, coeffs, degree_dim)
   1954     raise ValueError(
   1955         f"Dimension `{degree_dim}` should be a coordinate variable with labels."
   1956     )
   1957 if not np.issubdtype(coeffs[degree_dim].dtype, int):
-> 1958     raise ValueError(
   1959         f"Dimension `{degree_dim}` should be of integer dtype. Received {coeffs[degree_dim].dtype} instead."
   1960     )
   1961 max_deg = coeffs[degree_dim].max().item()
   1962 coeffs = coeffs.reindex(
   1963     {degree_dim: np.arange(max_deg + 1)}, fill_value=0, copy=False
   1964 )

ValueError: Dimension `degree` should be of integer dtype. Received int64 instead.

In the file computation.py(present in xarray/core), the program raises an error if the function np.issubdtype() returns false. Here, it wanted coeffs[degree_dim] to be of int data type (windows default int data type is int32) , but it was of int64 datatype, and as the documentation of np.issubdtype() states here : "Similar types of different sizes are not subdtypes of each other" so the function returned false and the program raised an error.

To fix this error, I have changed the following :

What

I replaced the np.issubdtype call with obj.dtype.kind not in "i" to check if coeffs[degree_dim] is a general signed integer data type.

@keewis keewis changed the title Fixed a bug causing unneeded ValueError in xarray/core/computation.py don't use issubdtype to check for integer dtypes in polyval Mar 13, 2023
@keewis
Copy link
Collaborator

keewis commented Mar 13, 2023

this happens with any code that calls polyval on coefficients with int32 dtypes. The example from the docs:

In [19]: import xarray as xr
    ...: 
    ...: x = xr.DataArray(np.arange(10), dims=["x"], name="x")
    ...: a = xr.DataArray(3 + 4 * x, dims=["x"], coords={"x": x})
    ...: out = a.polyfit(dim="x", deg=1, full=True)
    ...: out
Out[19]: 
<xarray.Dataset>
Dimensions:               (degree: 2)
Coordinates:
  * degree                (degree) int64 1 0
Data variables:
    x_matrix_rank         int64 2
    x_singular_values     (degree) float64 1.358 0.3963
    polyfit_coefficients  (degree) float64 4.0 3.0
    polyfit_residuals     float64 4.522e-28

In [20]: xr.polyval(coord=x, coeffs=out.assign_coords(degree=out.degree.astype("int64")))
Out[20]: 
<xarray.Dataset>
Dimensions:               (x: 10)
Dimensions without coordinates: x
Data variables:
    x_matrix_rank         (x) int64 2 4 6 8 10 12 14 16 18 20
    x_singular_values     (x) float64 0.3963 1.754 3.111 ... 9.899 11.26 12.61
    polyfit_coefficients  (x) float64 3.0 7.0 11.0 15.0 ... 27.0 31.0 35.0 39.0
    polyfit_residuals     (x) float64 4.522e-28 9.044e-28 ... 4.07e-27 4.522e-27

In [21]: xr.polyval(coord=x, coeffs=out.assign_coords(degree=out.degree.astype("int32")))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[21], line 1
----> 1 xr.polyval(coord=x, coeffs=out.assign_coords(degree=out.degree.astype("int32")))

File .../xarray/core/computation.py:1972, in polyval(coord, coeffs, degree_dim)
   1968     raise ValueError(
   1969         f"Dimension `{degree_dim}` should be a coordinate variable with labels."
   1970     )
   1971 if not np.issubdtype(coeffs[degree_dim].dtype, int):
-> 1972     raise ValueError(
   1973         f"Dimension `{degree_dim}` should be of integer dtype. Received {coeffs[degree_dim].dtype} instead."
   1974     )
   1975 max_deg = coeffs[degree_dim].max().item()
   1976 coeffs = coeffs.reindex(
   1977     {degree_dim: np.arange(max_deg + 1)}, fill_value=0, copy=False
   1978 )

ValueError: Dimension `degree` should be of integer dtype. Received int32 instead.

@keewis
Copy link
Collaborator

keewis commented Mar 13, 2023

this looks good to me, but we'd need some tests to make sure we don't reintroduce this issue. I'd probably extend the "simple" case of xarray/tests/test_computation.py::test_polyval by creating two versions: one with int32 values for degree and one with int64 values

Copy link
Collaborator

@headtr1ck headtr1ck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you could add a test for this, that would be great :)

xarray/core/computation.py Outdated Show resolved Hide resolved
@TomNicholas
Copy link
Member

@Ravenin7 do you need some guidance on how to add a test for this? Happy to help if so. It would be great to get this fix merged soon because a number of other people have encountered the same bug!

@headtr1ck headtr1ck mentioned this pull request Mar 16, 2023
@headtr1ck headtr1ck added the plan to merge Final call for comments label Mar 22, 2023
@headtr1ck headtr1ck enabled auto-merge (squash) March 22, 2023 19:22
@headtr1ck headtr1ck merged commit 1e361cc into pydata:main Mar 22, 2023
dcherian added a commit to dcherian/xarray that referenced this pull request Mar 26, 2023
* main: (40 commits)
  Faq pull request (According to pull request pydata#7604 & issue pydata#1285 (pydata#7638)
  add timeouts for tests (pydata#7657)
  Pull Request Labeler - Undo workaround sync-labels bug (pydata#7667)
  [pre-commit.ci] pre-commit autoupdate (pydata#7651)
  Allow all integer dtypes in `polyval` (pydata#7619)
  [skip-ci] dev whats-new (pydata#7660)
  Redo whats-new for 2023.03.0 (pydata#7659)
  Set copy=False when calling pd.Series (pydata#7642)
  Pin pandas < 2 (pydata#7650)
  Whats-new for release 2023.03.0 (pydata#7643)
  Bump pypa/gh-action-pypi-publish from 1.7.1 to 1.8.1 (pydata#7648)
  Use more descriptive link texts (pydata#7625)
  Fix missing 'dim' argument in _get_nan_block_lengths (pydata#7598)
  Fix `pcolormesh` with str coords (pydata#7612)
  [skip-ci] Fix groupby binary ops benchmarks (pydata#7603)
  Remove incomplete sentence in IO docs (pydata#7631)
  Allow indexing unindexed dimensions using dask arrays (pydata#5873)
  Bump pypa/gh-action-pypi-publish from 1.6.4 to 1.7.1 (pydata#7618)
  [pre-commit.ci] pre-commit autoupdate (pydata#7620)
  add a test for scatter colorbar extend (pydata#7616)
  ...
dcherian added a commit to kmsquire/xarray that referenced this pull request Mar 29, 2023
* upstream/main: (716 commits)
  Faq pull request (According to pull request pydata#7604 & issue pydata#1285 (pydata#7638)
  add timeouts for tests (pydata#7657)
  Pull Request Labeler - Undo workaround sync-labels bug (pydata#7667)
  [pre-commit.ci] pre-commit autoupdate (pydata#7651)
  Allow all integer dtypes in `polyval` (pydata#7619)
  [skip-ci] dev whats-new (pydata#7660)
  Redo whats-new for 2023.03.0 (pydata#7659)
  Set copy=False when calling pd.Series (pydata#7642)
  Pin pandas < 2 (pydata#7650)
  Whats-new for release 2023.03.0 (pydata#7643)
  Bump pypa/gh-action-pypi-publish from 1.7.1 to 1.8.1 (pydata#7648)
  Use more descriptive link texts (pydata#7625)
  Fix missing 'dim' argument in _get_nan_block_lengths (pydata#7598)
  Fix `pcolormesh` with str coords (pydata#7612)
  [skip-ci] Fix groupby binary ops benchmarks (pydata#7603)
  Remove incomplete sentence in IO docs (pydata#7631)
  Allow indexing unindexed dimensions using dask arrays (pydata#5873)
  Bump pypa/gh-action-pypi-publish from 1.6.4 to 1.7.1 (pydata#7618)
  [pre-commit.ci] pre-commit autoupdate (pydata#7620)
  add a test for scatter colorbar extend (pydata#7616)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
plan to merge Final call for comments
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants