Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unexpected .load() behaviour when nodata is not set #162

Open
Alex-Mackie opened this issue Jul 1, 2024 · 5 comments
Open

Unexpected .load() behaviour when nodata is not set #162

Alex-Mackie opened this issue Jul 1, 2024 · 5 comments

Comments

@Alex-Mackie
Copy link

We have some STAC assets that don't have nodata values (every coordinate contains a valid value) and therefore the nodata property of the raster extension is not set.

The problem is that when users load this data with .load() they'll get a partial Dataset without warning.

Compare:

odc_stac.load(search.items(), chunks={'x':8000, 'y':8000}, resolution=0.1).FC.plot()

image

with

odc_stac.load(search.items(), chunks={'x':8000, 'y':8000}, nodata=999, resolution=0.1).FC.plot()

image

We'd like to protect our users from this. It isn't intuitive to all users that combining/merging arrays in xArray will often require a safe nodata value to be supplied so they'd typically be mystified.

Should we update our catalogue with a nodata value that would be "safe" to use in these circumstances (even though it never actually occurs) or should odc_stac.load() raise an error or warning in a situation like this where merging/combining is implied but no nodata value is supplied (either via the STAC metadata or as an argument to .load())?

@Kirill888
Copy link
Member

@Alex-Mackie what is the data type of your pixels? If it's float32 then this could be an error in odc-stac as it should default to using nan as a fill value and not 0. If it's not float32 but something like int16 then this still looks like an error in odc-stac to me as it should be treating right-side of an image as "empty" and paste the other dataset there without an issue, but clearly it treats that part as "filled".

nodata has two meanings really:

  1. a replacement for NaN when using non-float data types, a cheaper version of a separate mask image
  2. fill value, default value for pixels outside of the data coverage

To configure nodata in your stac catalog refer to this:

https://odc-stac.readthedocs.io/en/latest/stac-best-practice.html

To summarize: what you are reporting is probably an error. You can work around the error by explicitly forcing a specific "fill value" outside of the valid range of your data. If your data is floating point, try forcing nodata=float("nan") and see if that fixes it.

@Alex-Mackie
Copy link
Author

Thanks Kirill, we get this behaviour on uint8 & uint16 assets.

I'm beginning to think it is prudent to supply a nodata value anyway in these situations (as a general STAC-publishing practise) to function as a prompt for what to use outside the data coverage. The only time this would be tricky is if you are using the whole range of your dtype (which is not typical in our use cases).

@Kirill888
Copy link
Member

Even if your source data has no holes in it, it is still useful to allocate a value to mean nan if possible. It’s harder with single byte rasters. This is usually handled with mask arrays stored next to pixel values, unfortunately odc-stac currently doesn’t support data masked in that way properly.

Most tested case is probably int16 with nodata value set.

@Kirill888
Copy link
Member

@Alex-Mackie do you mind testing your failing case with the latest versions of odc-stac and odc-geo? There were a bit of rework around that area.

@Alex-Mackie
Copy link
Author

I was using

odc-geo==0.4.6
odc-stac==0.3.9

I can confirm I see the same behaviour after updating to:

odc-geo==0.4.7
odc-stac==0.3.10

@Alex-Mackie Alex-Mackie changed the title How to handle/publish data with no nodata such that .load() will work reliably Unexpected .load() behaviour when nodata is not set Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants