Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support monthly and weekly data for preprocess module #173

Merged
merged 15 commits into from
Aug 18, 2023

Conversation

geek-yang
Copy link
Member

@geek-yang geek-yang commented Aug 1, 2023

Support monthly/weekly data in preprocess by providing a timescale argument to the user.

This closes #168 and closes #160 .

@review-notebook-app
Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

s2spy/preprocess.py Outdated Show resolved Hide resolved
s2spy/preprocess.py Outdated Show resolved Hide resolved
@geek-yang geek-yang marked this pull request as ready for review August 3, 2023 11:52
Copy link
Member Author

@geek-yang geek-yang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the unittest below, it fails only for python 3.8. It seems a version conflict between pandas and xarray causes the failure of resampling:

@pytest.mark.skipif(
sys.version_info < (3, 9),
reason="Resample fails due to version conflict of xarray and pandas.",
)
def test_get_climatology_monthly(self, raw_field):
raw_field_monthly = raw_field.resample(time="M").mean()

But since it only relates to the preparation of testing data, not about the code. I simply silent it for python 3.8. Just let me know if you have better solutions. Thanks @BSchilperoort !

Copy link
Contributor

@BSchilperoort BSchilperoort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Yang, I still have to test the code, but here are already some comments :)

from typing import Tuple
from typing import Union
import numpy as np
import scipy.stats
import xarray as xr


TIMESCALE_TYPE = Literal["monthly", "weekly", "daily"]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not so sure about having this alias. Users will see this in the documentation, and when accessing the class/function using help(...):
image

Perhaps this works better if you define Timescale as an alias (see the python docs).

from typing import TypeAlias
Timescale: TypeAlias = Literal["monthly", "weekly", "daily"]

You can try that, and see if it shows up in the documentation or not.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for raising this point! It seems using TypeAlias is more formal.

However, just find that it also couldn't be spelled out in the documentation.

image

Both cases work well with help(...)
image

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no....TypeAlias is available from python 3.10. In python <= 3.9 it should be imported from typing_extensions. (https://mypy.readthedocs.io/en/stable/kinds_of_types.html#type-aliases)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html#confval-autodoc_type_aliases it seems like sphinx autodoc does resolve the aliases (unless you explicitly don't want that).

sphinx-autoapi (what we use) doesn't seem to support this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look elegant :) 👀

Showing the options (Literal['monthly', 'weekly', 'daily']) would be nice in the docs. If I understand correctly, our sphinx-autodoc does not resolve this (yet), and shinx-autoapi does?

Is it an option to switch? or is there a downside?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is too much to switch from autoapi to autodoc simply for an unresolved alias 😂 , given that they are a bit different and autoapi works quite well for now.

I have added a hint in the tutorial notebook tutorial_preprocessing.ipynb:
"User needs to inform preprocessor the temporal resoltuion of their data, by setting the timescale argument to be either daily, weekly or monthly timescale."

And we have very clear error message to the user about the supported timescales:

s2spy/s2spy/preprocess.py

Lines 131 to 138 in f157ab2

def _check_temporal_resolution(timescale: Timescale) -> Timescale:
support_temporal_resolution = ["monthly", "weekly", "daily"]
if timescale not in support_temporal_resolution:
raise ValueError(
"Given temporal resoltuion is not supported."
"Please choose from 'monthly', 'weekly', 'daily'."
)
return timescale

I think this should be sufficient.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can absolutely live with that 😆 . Thanks a lot for the work @geek-yang and the checking @BSchilperoort!

s2spy/preprocess.py Outdated Show resolved Hide resolved
s2spy/preprocess.py Outdated Show resolved Hide resolved
s2spy/preprocess.py Outdated Show resolved Hide resolved
s2spy/preprocess.py Outdated Show resolved Hide resolved
@BSchilperoort
Copy link
Contributor

But since it only relates to the preparation of testing data, not about the code. I simply silent it for python 3.8

It's not ideal, but I also wouldn't want to spend too much time on that. Perhaps we have to deprecate python 3.8 anyway: xarray isn't supporting it anymore for new releases.

@geek-yang
Copy link
Member Author

But since it only relates to the preparation of testing data, not about the code. I simply silent it for python 3.8

It's not ideal, but I also wouldn't want to spend too much time on that. Perhaps we have to deprecate python 3.8 anyway: xarray isn't supporting it anymore for new releases.

I agree. Let's drop python 3.8.

if timescale == "monthly":
temporal_resolution = temporal_resolution.astype(int)
min_days, max_days = (28, 31)
if not max_days > temporal_resolution > min_days:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Yang,

I pulled the branch to test it :)

In my test, temporal_resolution for monthly data is 31. According to the statement, max_days (31) is not larger than 31 ;)
if not max_days > temporal_resolution > min_days:

Should be:
if not max_days >= temporal_resolution >= min_days:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh! Yes, of course! Thanks for fixing this!

Copy link
Contributor

@BSchilperoort BSchilperoort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just the annoying lack of type alias support in sphinx autoapi :/

Otherwise everything looks good.

.github/workflows/build.yml Show resolved Hide resolved
s2spy/preprocess.py Outdated Show resolved Hide resolved
@sonarcloud
Copy link

sonarcloud bot commented Aug 14, 2023

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

91.8% 91.8% Coverage
0.0% 0.0% Duplication

@geek-yang geek-yang merged commit c060bc6 into main Aug 18, 2023
13 checks passed
@geek-yang geek-yang deleted the preprocess_monthly_weekly_data branch August 18, 2023 08:55
@semvijverberg
Copy link
Member

Thanks @geek-yang @BSchilperoort!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

groupby.dayofyear not suitable for weekly/monthly data Support non-daily data in preprocessor
3 participants