Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GeoTransform as implemented by GDAL #19

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

dblodgett-usgs
Copy link

For consideration. It's rough, but I think this is more or less what we are after in #17 ??

I think the logic to just adopt what's already in GDAL makes a lot of sense.

This isn't incompatible with CF in that data that uses this approach just wouldn't have georeference information in a client that expects normal coordinate variables.

@dblodgett-usgs
Copy link
Author

@christophenoel -- I've altered my terminology here to avoid the Auxiliary term per discussion in #17

@briannapagan
Copy link

@dblodgett-usgs can you identify 1-2 reviewers you feel that would have substantive inputs? Is this something we can already prototype interoptibility?

@dblodgett-usgs
Copy link
Author

dblodgett-usgs commented Apr 24, 2023

I don't have permissions to add reviewers formally, but perhaps @christophenoel, @rouault, @edzer (others in the R spatial community?), @snowman2 (others in the python spatial community)?

We could certainly prototype it -- it's actually more or less in use in the wild since it's adopting what GDAL already does.

@briannapagan
Copy link

We could certainly prototype it -- it's actually more or less in use in the wild since it's adopting what GDAL already does.

Could you attach an example zarr file then?

@dblodgett-usgs
Copy link
Author

@briannapagan -- I'm not really in a position to be able to create a zarr mock up. Maybe someone a little closer to the implementation could do it? The NetCDF CDL representation would look like:

float data(latitude, longitude) ;
    data:grid_mapping = "crs: latitude, longitude" ;
    ...
  int crs ;
    crs:geotransformation_matrix = Xoff, X0, X1, Yoff, Y0, Y1 ;
    crs:grid_mapping_name = "latitude_longitude";
    crs:longitude_of_prime_meridian = 0.0 ;
    crs:semi_major_axis = 6378137.0 ;
    crs:inverse_flattening = 298.257223563 ;
    crs:crs_wkt =
     GEODCRS["WGS 84",
     DATUM["World Geodetic System 1984",
       ELLIPSOID["WGS 84",6378137,298.257223563,
         LENGTHUNIT["metre",1.0]]],
     PRIMEM["Greenwich",0],
     CS[ellipsoidal,3],
       AXIS["(lat)",north,ANGLEUNIT["degree",0.0174532925199433]],
       AXIS["(lon)",east,ANGLEUNIT["degree",0.0174532925199433]],
       AXIS["ellipsoidal height (h)",up,LENGTHUNIT["metre",1.0]]]

@dblodgett-usgs
Copy link
Author

dblodgett-usgs commented Apr 26, 2023

@snowman2 -- so what I take away from your references is that this is basically already supported in more than just GDAL? What do you think about space separated vs an actual vector of doubles?

e.g. https://corteva.github.io/rioxarray/stable/rioxarray.html#rioxarray.rioxarray.XRasterBase.write_transform

Based on this, I think it is sensible to just move ahead with this proposal. The only catch is that gdal uses space seperated strings and it may be good to support both strings and explicit vector attributes?

@snowman2
Copy link

What do you think about space separated vs an actual vector of doubles?

The approach rioxarray has taken is to maximize compatibility. So, the format that works with GDAL is the version used by rioxarray. It would be nice if the storage format here is compatible with both netCDF and zarr in GDAL.

geozarr-spec.md Outdated Show resolved Hide resolved
@dblodgett-usgs
Copy link
Author

Per updates and discussion on the community call today, we generally have agreement on the current PR. Does anyone tagged here have interest in review / expressing support?

The text now describes that the origin is intended to be a cell corner and shows the equation to resolve cell centers.

We have settled on space delimited ascii for compatibility with existing implementations.

We are using a GeoTransform attribute on the grid_mapping variable, largely following CF and compatible with GDAL and rioxarray.

geozarr-spec.md Outdated Show resolved Hide resolved
@briannapagan
Copy link

I would be willing to review if we can get an example file to test. @kapadia can you cross check what you posted here:
#10 versus this PR?

@christophenoel
Copy link

It seems to me that this PR aligns with the desire to support multiple coordinate encodings (not only coordinate arrays), which I fully support.

Note that for indicating the projection of a n-D data variable, the current convention is to define in the data array a property called grid_mapping, indicating the name of the auxiliary variable (a Zarr array whcih is not data, neither coordinate) that contains the attributes describing this grid mapping. The auxiliary variables that contains the "grid_mapping" is an empty array and contains the crs attributes (as shown in above example)

Therefore, we could also use as proposed this grid_mapping property for describing origin_offset coordinates (using the GeoTransform).

However, I think that for consistency, each dimension of a variable should map to a coordinate variable. For origin_offset instead of containing a 1D array of all coordinates values, the Zarr Array, would contain either:

  • an empty array, and the grid_mapping attribute referencing this auxiliary variable.
  • even much simpler, provide the attribute type=origin_offset and a1D array that contains the tuple X_offset X0, X1 (or Y_offset, Y0, Y1)

To maximise/facilitate compatibility with map tools / generic client, I would suggest to preserve the grid_mapping when provided, but to impose the origin_offset tuple for each coordinate variable.

@ashiklom
Copy link

Here's a rough draft at a suggestion (assuming an extension of the Zarr v3 spec). The specific granule I am using here is HLS.S30.T11XNE.2024176T201849.v2.0.B01.tiff. Full output of gdalinfo on that file is in the "Details".

Something to consider.

import rasterio

rast = rasterio.open("sample-data/HLS.S30.T11XNE.2024176T201849.v2.0.B01.tiff")

# Creating a `zarr.json` dict "by hand".
zarr_meta = {
    "zarr_format": 3,
    "shape": rast.shape,  # (3660, 3660),
    # Other stuff...
    "chunk_grid": {
        "name": "raster",  # NOTE: custom grid type
        "configuration": {
            "crs": {
                # Either an EPSG code
                "epsg": rast.crs.to_epsg(),  # 32611
                # ...or a WKT string
                "wkt": rast.crs.to_wkt()    # 'PROJCS["WGS 84 / UTM zone 11N",GEOGCS[<etc...>]]'
            },
            "transform": list(rast.transform),   # [30.0, 0.0, 499980.0, 0.0, -30.0, 8500020.0, 0.0, 0.0, 1.0],

            # Is each pixel a point or an area? See https://docs.ogc.org/is/19-008r4/19-008r4.html#RasterSpace
            # See also https://docs.ogc.org/is/19-008r4/19-008r4.html#_cookbook_for_defining_transformations
            "area_or_point": rast.tags()["AREA_OR_POINT"]  # 'Area'
        }
    }
}

Some additional notes:

  1. A tricky thing about the affine transform is that if there is any rotation (which is very common!), we cannot use lat and lon (or whatever projected coordinates) as dimensions anymore (or, at least, x and y no longer map cleanly onto lon and lat, respectively). In other words: Let transform = [a, b, c, d, e, f, g, h]. In that case, lon = ax + by + c and lat = ex + fy + g (or something like that). For nonzero values of b and f, we now have to treat lat and lon as data, not dimensions, because they depend on both x and y. That means we have to think differently about how Xarray's dat.sel(lat = slice(...)) for this kind of dataset would work --- we can't just define lat and lon as dimensions and then proceed as usual.

    • That said, the whole point of an affine transform is that a .sel(lat = ...) operation doesn't have to do a lookup; it just has to solve the system of equations above for x and y. When both indices are specified (.sel(lon = n, lat = m, method = "nearest")), this has a unique solution --- you get one value back (can be solved with matrix algebra). When just one index is specified (.sel(lon = n, method = "nearest")), the solution is a list of x and y values.
    • Another school of thought is that .sel is fundamentally the wrong abstraction for this and, rather than trying to get GeoZarr working with vanilla xarray, we should get GeoZarr working with rioxarray and just use rioxarray's rio GDAL-based methods for these kinds of spatial subsets (maybe, with some extra syntactic sugar).
  2. The GeoTIFF area_or_point thing is roughly analogous to CF's idea of points vs. cells (https://cfconventions.org/cf-conventions/cf-conventions.html#_data_representative_of_cells). When area_or_point = "Point", that is analogous to the CF default --- the data are exactly at the coordinates specified. When area_or_point = "Area", that is analogous to a CF "bounds" situation --- the data fall between coordinates, in the middle of the cell. The relevant GeoTIFF convention is described here: https://docs.ogc.org/is/19-008r4/19-008r4.html#RasterSpace. The relevant GDAL documentation for GeoTIFF (where the AREA_OR_POINT variable is defined) is described here: https://gdal.org/drivers/raster/gtiff.html#metadata

# gdalinfo sample-data/HLS.S30.T11XNE.2024176T201849.v2.0.B01.tiff

Driver: GTiff/GeoTIFF
Files: sample-data/HLS.S30.T11XNE.2024176T201849.v2.0.B01.tiff
Size is 3660, 3660
Coordinate System is:
PROJCRS["WGS 84 / UTM zone 11N",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4326]],
    CONVERSION["UTM zone 11N",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",0,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",-117,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",0.9996,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",500000,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",0,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Navigation and medium accuracy spatial referencing."],
        AREA["Between 120°W and 114°W, northern hemisphere between equator and 84°N, onshore and offshore. 
Canada - Alberta; British Columbia (BC); Northwest Territories (NWT); Nunavut. Mexico. United States (USA).
"],
        BBOX[0,-120,84,-114]],
    ID["EPSG",32611]]
Data axis to CRS axis mapping: 1,2
Origin = (499980.000000000000000,8500020.000000000000000)
Pixel Size = (30.000000000000000,-30.000000000000000)
Metadata:
  ACCODE=LaSRC
  add_offset=0.0
  AREA_OR_POINT=Area
  arop_ave_xshift(meters)=0
  arop_ave_yshift(meters)=0
  arop_ncp=0
  arop_rmse(meters)=0
  arop_s2_refimg=NONE
  cloud_coverage=8
  DATASTRIP_ID=S2B_OPER_MSI_L1C_DS_2BPS_20240624T223120_S20240624T201851_N05.10
  HLS_PROCESSING_TIME=2024-06-26T13:18:32Z
  HORIZONTAL_CS_CODE=EPSG:32611
  HORIZONTAL_CS_NAME=WGS84 / UTM zone 11N
  L1C_IMAGE_QUALITY=NONE
  L1_PROCESSING_TIME=2024-06-24T23:19:08.431601Z
  long_name=Coastal_Aerosol
  MEAN_SUN_AZIMUTH_ANGLE=191.826619464522
  MEAN_SUN_ZENITH_ANGLE=52.9361257682694
  MEAN_VIEW_AZIMUTH_ANGLE=125.768796718571
  MEAN_VIEW_ZENITH_ANGLE=6.74140452066478
  MSI band 01 bandpass adjustment slope and offset=0.995900, -0.000200
  MSI band 02 bandpass adjustment slope and offset=0.977800, -0.004000
  MSI band 03 bandpass adjustment slope and offset=1.007500, -0.000800
  MSI band 04 bandpass adjustment slope and offset=0.976100, 0.001000
  MSI band 11 bandpass adjustment slope and offset=1.000000, -0.000300
  MSI band 12 bandpass adjustment slope and offset=0.986700, 0.000400
  MSI band 8a bandpass adjustment slope and offset=0.996600, 0.000000
  NBAR_SOLAR_ZENITH=53.0242290542587
  NCOLS=3660
  NROWS=3660
  OVR_RESAMPLING_ALG=NEAREST
  PROCESSING_BASELINE=05.10
  PRODUCT_URI=S2B_MSIL1C_20240624T201849_N0510_R071_T11XNE_20240624T223120.SAFE
  scale_factor=0.0001
  SENSING_TIME=2024-06-24T20:23:29.449379Z
  SPACECRAFT_NAME=Sentinel-2B
  spatial_coverage=98
  SPATIAL_RESOLUTION=30
  TILE_ID=S2B_OPER_MSI_L1C_TL_2BPS_20240624T223120_A038135_T11XNE_N05.10
  ULX=499980
  ULY=8500020
  _FillValue=-9999
Image Structure Metadata:
  COMPRESSION=DEFLATE
  INTERLEAVE=BAND
  PREDICTOR=2
Corner Coordinates:
Upper Left  (  499980.000, 8500020.000) (117d 0' 2.78"W, 76d34'51.70"N)
Lower Left  (  499980.000, 8390220.000) (117d 0' 2.59"W, 75d35'49.26"N)
Upper Right (  609780.000, 8500020.000) (112d46'11.37"W, 76d32'44.49"N)
Lower Right (  609780.000, 8390220.000) (113d 3' 8.26"W, 75d33'51.04"N)
Center      (  554880.000, 8445120.000) (114d57'21.20"W, 76d 4'49.86"N)
Band 1 Block=256x256 Type=Int16, ColorInterp=Gray
  Description = Coastal_Aerosol
  NoData Value=-9999
  Overviews: 1830x1830, 915x915, 458x458, 229x229
  Offset: 0,   Scale:0.0001

@mdsumner
Copy link

mdsumner commented Jun 26, 2024

Where do you get that rotation in affine terms is common? It's extremely rare, and it's really better to work with an extent in most cases which is simpler to use and think about. No one uses a transform for specifying the output grid for regridding for example, though that's probably how the format will store it (an affine transform is great of course and a massive step forward for xarray and Zarr for when rectilinear is degenerate (which is extremely common) or curvilinear is degenerate (less common but out there and even more cryptic)).

@christophenoel
Copy link

zarr_meta = {
    "zarr_format": 3,

Hello,

This looks like an interesting idea. However, I am not yet familiar with the attributes in Zarr V3. Based on V2, in a dataset (as outlined in the current draft of the specification), either the Zarr group of the dataset or a child (empty) Zarr array (such as grid_mapping) would hold such metadata in the .zattrs file (zarr.json in V3).

@ashiklom
Copy link

Where do you get that rotation in affine terms is common?

@mdsumner I may be wrong! I thought it was fairly common for things like airborne and (lower-level) satellite data, where the rotation is oriented in the direction of the flight path / orbit and you can save some space by not having big corners of NaN in every scene. That said, my example above is harmonized Landsat-Sentinel (HLS), a classic example of this kind of data, and it's not rotated...so maybe this is rarer than I thought. If most data really are strictly north-south oriented, then great! That makes the problem much simpler. And maybe it's somewhat out of scope for GeoZarr to have really good support for rotated grids.

However, I am not yet familiar with the attributes in Zarr V3. Based on V2, in a dataset (as outlined in the current draft of the specification), either the Zarr group of the dataset or a child (empty) Zarr array (such as grid_mapping) would hold such metadata in the .zattrs file (zarr.json in V3).

@christophenoel Agreed that exactly where these metadata should live is a bit of an open question. I don't have strong opinions about it. I do think that we need to be proactive in supporting Zarr v3, though, so that GeoZarr doesn't immediately become outdated. If it's not too much work to also support v2, of course we should do that, too.

@mdsumner
Copy link

mdsumner commented Jun 28, 2024

I think there are some satellite streams that have this rotation (you need to have it as capability!) but in 20 years I've only encountered it a couple of times. FWIW it's not about north-south orientation, consider a map of Antarctica in Polar Stereographic. It's just a basic offset and scale situation in units-of-the-projection (same as any other regular grid, in Mercator, LAEA, or longlat etc).

@christophenoel
Copy link

@christophenoel Agreed that exactly where these metadata should live is a bit of an open question. I don't have strong opinions about it. I do think that we need to be proactive in supporting Zarr v3, though, so that GeoZarr doesn't immediately become outdated. If it's not too much work to also support v2, of course we should do that, too.

This seems indeed totally feasible. I hope to study if supporting both would be straightforward in the next weeks.

@mdsumner
Copy link

mdsumner commented Jun 28, 2024

It belongs in xarray (and hopefully the cross lang library that becomes ...)

They're working on it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants