Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to read Oasis Montaj© grd files #348

Merged
merged 32 commits into from
Nov 25, 2022
Merged

Conversation

santisoler
Copy link
Member

@santisoler santisoler commented Aug 29, 2022

Add a new load_oasis_montaj_grid function that reads Geosoft Oasis Montaj©
GRD files and returns an xarray.DataArray that includes the 2D grid values,
its coordinates and the information available in the file header. The function
supports reading uncompressed and compressed grids (with gzip); rotated and
unrotated grids; orders of +/- 1 (the order in which the grid values are stored
in the file); every possible data type supported in GRD files. The function
doesn't support orders besides +/- 1, colour grids nor grids. Compressed grids
are assumed to be compressed with gzip regadless that the header specifies that
it uses LZRW1. The new function lives in a newly created _io submodule, where
the load_icgem_gdf function has been moved to.

Related repos

The support for compressed grids was inspired by the code in
https:/Loop3D/geosoft_grid copyrighted by Loop3D and released under
the MIT License.

References
https://help.seequent.com/Oasis-montaj/9.9/en/Content/ss/glossary/grid_file_format__grd.htm

@mdtanker
Copy link
Member

Nice! This will be very useful 😄

@santisoler
Copy link
Member Author

@leouieda What would be a good way to test this function? We were discussing this in the last meeting and we think a good idea would be to have sample sample .grd files that we can pass to the function. I can make synthetic ones, but it would defeat the purpose.

@Esteban82
Copy link
Member

@leouieda What would be a good way to test this function? We were discussing this in the last meeting and we think a good idea would be to have sample sample .grd files that we can pass to the function. I can make synthetic ones, but it would defeat the purpose.

I know a coleague that works with Oasis Montaj (I think 6). Maybe I could ask him for an small sample grid. Which size should it be? I think that there is no problem with the version.

@santisoler
Copy link
Member Author

That would be great @Esteban82!

We would need small grids so they don't take too much space. Something around 50x50 elements would be good. Nevertheless, it would be nice if they have different number of points per direction, like: 49x51.

On top of that, it would be nice to have a grid with different data types:

  • doubles
  • floats
  • ints
  • unsigned ints

And with different features:

For the content of the grid, it can be any garbage. There's no need for it to be actual data. In fact, if it's synthetic data it's better (something like a chessboard or a gaussian well, etc).

The other requirement is that the grids should be under Creative Commons CC-BY license. Either that or that the author of the grids pushes them directly to the repo, so we guarantee that we are not falling in any form of copyright infringement.

@LL-Geo
Copy link
Member

LL-Geo commented Sep 29, 2022

I get one test dataset, haha...
image

@LL-Geo
Copy link
Member

LL-Geo commented Oct 12, 2022

A quick update the test works fine for un-compressed grd.
image

But the default output from oasis montaj looks like a compressed grid, which means we can't read most data shared in .grd format atm...
image
image

We might need to figure out how compressed grid worked...

@santisoler
Copy link
Member Author

Awesome @LL-Geo!

So, I would also want to support compressed grids. From what I read in GRD File specification sheet Oasis Montaj uses one of two compression algorithms: zlib and LZRW1. The first one is fairly easy to support using the zlib built-in module. But for LZRW1 things are trickier. I've only found python-lzrw1-kh, but it's a very slow pure Python implementation, not intended to be used on large files. Therefore I would like to support zlib compression for now. I'm curious if that's the default compression algorithm that Oasis uses.

Would you like to save these GRD files in harmonica/tests/data/ and push them? So we can play with them, write tests and see if we can extend the capabilities of this function. Check my previous comment to see what type of grids might be nice to have.

Upload test grd file
fix ValueError: cannot convert float NaN to integer with datetype (int)
@LL-Geo
Copy link
Member

LL-Geo commented Oct 19, 2022

@santisoler I upload some grd files for testing (the unsigned grd and -1 order grd are a little bit tricky, I think I need some reverse-engineering ways to do that. But it won't be a problem).

It seems the default compress method is LZRW1. I checked different types of compressed grd, they all show COMP_TYPE as 2 (512+4 : 512+8). I also tried the python-lzrw1-kh, and the result does not match the uncompressed file (I didn't dig into that package.. 😆).
image

I upload a small grid (5*5) om_small_compress.grd. I think it will help us to understand how it works. The header start at 512.
image

As for the rotation, I think we could just remove the rotation warning and store the unrotated grd. Maybe the grd rotation is more related to Verde?

image

@leouieda
Copy link
Member

@LL-Geo the rotation is probably best handled by multidimensional coordinates in xarray. For example, the main dims of the grid would be the non-rotated x and y coordinates (1D arrays) and then longitude/latitude or easting/northing can be stored as 2D arrays of x, y with the actual rotated coordinate values. xarray knows what to do for indexing and plotting in these cases. See https://docs.xarray.dev/en/stable/examples/multidimensional-coords.html

@santisoler
Copy link
Member Author

santisoler commented Oct 26, 2022

This is amazing @LL-Geo! Those hexdumps look so hacky! 😎

I was afraid that the default compression was the LZRW1,so sad. I wouldn't support that kind of compression in this PR. It seems too much of a hassle and there are some useful stuff in here already to block this PR just because of that. So let's leave compression for a future PR.

I think we need to wrap this PR before it sits too long here without being merge. As I said, there are useful stuff here already, and we can always add more later.

For this PR I would do the following:

  • Add tests for the grids we are currently supporting:
    • no-rotated
    • different data types: double, float, long, short
  • Extend support for rotated grids and add tests for them

For future PRs:

  • Add tests for grids with ordering equal to -1
  • Support compressed grids with gzlib
  • Support compressed grids with LZRW1 (there's no need, since compressed grids use gzip even if the header says LZRW1)
  • Support other orderings (different than +/- 1) (even the docs says they support orderings =! +/-1, Oasis Montaj don't open those grids)

Makes sense? Feel free to tackle any of the first items and push here!

santisoler and others added 7 commits November 2, 2022 11:14
These variables usually don't have any meaningful information and they
are usually not being converted to Unicode characters properly, what
makes difficult to save the dataarray as a netcdf file. I think it's
better to ignore it for now unless we have a good reason to include
them.
Add a netcdf file that hosts the expected grid.
This way we avoid having to install netcdf4 for testing
@santisoler
Copy link
Member Author

@LL-Geo today I wrote some tests for this function using the grids you uploaded. Thank you very much!

I think we don't have a grid with ordering = -1. Is that true? If not, could you upload just one (double or floats will be ok) so we can test if my implementation for those is correct?

I was also trying to support the rotated grid and I was facing some issues. The plot you added in your previous comment shows the grid already rotated, but I'm worried that the easting and northing coordinates are not accurate. There's a figure in the GEOSOFT specification sheet (https://help.seequent.com/Oasis-montaj/9.9/en/Content/ss/glossary/grid_file_format__grd.htm) that shows how the rotated grid are encoded. It seems that the x_origin and y_origin correspond to the coordinates of the "left-bottom corner" (would be the left bottom if we consider the unrotated grid). In your plot it has the coordinates (20, -10), while it should have (-21, -12) according to the header of the grid file. How does this rotate grid look in Oasis? Is the figure in the specification sheet accurate or is it wrong?

BTW, the grid is stored already rotated in the file. That's curious because all the nans take a lot of space. I thought that they were storing the unrotated grid values and then the rotation was done in the coordinates, but apparently that's not the case. In that scenario, what @leouieda shared about multidimensional grids with xarray is not needed here: the grid is already registered in the easting and northing coordinates.

I would love to talk about this is a call because it's hard to make sense out of it. So, what if we leave this PR without the support for rotated grids and leave that for a different PR? In that case we would only need to add a test for ordering=-1 in order to merge this one.

@LL-Geo
Copy link
Member

LL-Geo commented Nov 3, 2022

That looks awesome @santisoler !
I make an order=-1 file with a little trick (it's quite uncommon for the grid file with order=-1), and update the test.
For the rotation, I think I put a different file from the uploaded data with the picture I showed before. The actual data is not rotated in the GRD file, but the coordinate. The grey is unrotated grd, and the color one is rotated by -30 (om_rodate.grd). I update this file as well.

I guess we just need to project the coordination to make it works.
image

For the other goals, Support other orderings (different than +/- 1)
I think there is no other ordering that exists. I change the head in the binary file to 2, it can't be loaded by oasis montaj. So we can just remove it.
image

@santisoler
Copy link
Member Author

Thanks @LL-Geo!

For the rotation, I think I put a different file from the uploaded data with the picture I showed before. The actual data is not rotated in the GRD file, but the coordinate. The grey is unrotated grd, and the color one is rotated by -30 (om_rodate.grd). I update this file as well.

I see! So we were right about how to give support for the rotated grids with xarray and multidimensional coordinates. I'll try to upload a few changes to support rotated grids as well.

I make an order=-1 file with a little trick (it's quite uncommon for the grid file with order=-1), and update the test.

Now I'm curious: what was the trick? Isn't an option in Oasis to just change the ordering?

For the other goals, Support other orderings (different than +/- 1)
I think there is no other ordering that exists. I change the head in the binary file to 2, it can't be loaded by oasis montaj. So we can just remove it.

Thanks for testing this out! One more thing that we won't have to deal with then! Let's forget about orderings different than +/- 1.

Update compressed grd file
Flake8 complains with leaving a space before a color, what makes sense
for colons that go after a statement (function definition, for loop, if,
etc). But it complains about the spaces before a colon used in slicing.
Black already handles the first case, so it makes sense to ignore E203.
@santisoler
Copy link
Member Author

santisoler commented Nov 15, 2022

This is amazing @LL-Geo! Huge shout out to @markjessell for noticing that LZRW1 wasn't being use and for those 16 unexplained bytes.

I will work on the coordinates for the rotated grid then. I just pushed a couple of minor changes (mostly refactors).

This is already much better than the first version I drafted! Thanks for your contributions @LL-Geo!

@markjessell
Copy link

markjessell commented Nov 15, 2022 via email

@RichardScottOZ
Copy link
Contributor

We should send them a xmas hamper!

@santisoler santisoler marked this pull request as ready for review November 17, 2022 19:50
@santisoler santisoler changed the title Read Oasis Montaj© grd files Add function to read Oasis Montaj© grd files Nov 17, 2022
@santisoler
Copy link
Member Author

@LL-Geo I added support for rotated grids and refactor a little bit your code. I also removed all the sample files that weren't being used in the tests. Did you have a particular test for those? If so, feel free to revert that commit if you want them back.

I think this is ready to be merged. Let me know what do you think and if you agree we can merge it right away 🚀

This was really fun to work on and thanks a lot to every one that contributed to it!

Copy link
Member

@LL-Geo LL-Geo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome @santisoler ! 👍 A tiny change for the rotation coordinate, then perfect to go 😄
The other files are uploaded to figure out how compression works. So just remove them 🚀

harmonica/_io/oasis_montaj_grd.py Outdated Show resolved Hide resolved
Use proper sin function and fix the sign of the coefficients in the
rotation matrix.
@santisoler santisoler added this to the v0.6.0 milestone Nov 23, 2022
Copy link
Member

@mdtanker mdtanker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Mostly over-my-head, but I read through what I could understand, and tested with a whole bunch of grids I had, all of which worked great. I noticed one parameter of one function didn't have a docstring, so I added a comment for that.

One additional question/comment. What does the map_projection attribute mean? Most of my geosoft grids have a EPSG (3031) code defining their projection, but this doesn't seem to be preserved in the dataarray attributes. The attribute map_projection for all the grids just has a value of 0.

I noticed that the associated .xml files for my grids contain the EPSG codes. I wrote the below function to pull the code out. Would it be helpful to have this as part of the load_oasis_montaj_grid function?

def extract_proj_str(fname):
    with open(fname, "r") as f:
        for line in f:
            if "projection=" in line:
                # print lines with projection info
                print(line)

                # extract projection string
                proj = line.split('projection="')[1].split('" ')[0]
                print(proj)

                # remove non-alphanumeric characters if present
                if proj.isalnum() is False:
                    proj = ''.join(filter(str.isalnum, proj))

                assert proj.isalnum
    try:
        proj
    except:
        raise NameError("string 'projection=' not found in file.")
        proj = None

    return proj
fname = "example.grd"
proj = extract_proj_str(fname+".xml")

Here is the portion of the .xml with the EPSG code:

<projection type="POLAR_STEREOGRAPHIC" name="WGS 84 / *EPSG3031" projection="*EPSG3031" ellipsoid="WGS 84" datum="WGS 84" datumtrf="WGS 84" datumtrf_description="[WGS 84] World" radius="6378137" eccentricity="0.0818191908426215">

Tuple of floats containing the distance between adjacent grid elements
along each unrotated direction in the following order: ``spacing_y``,
``spacing_x``.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add docstrings for rotation_deg?

rotation_deg : float
     Grid axis rotation, in degrees counter clockwise, relative to the co-ordinate system axis.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch! Thanks for including it.

BTW, you can make suggestions during the review using the little icon that seems like a document with a + and - signs inside it. It generates a nice "Apply changes" button that creates the commit automatically, and it also gives you attribution to the PR by adding you as a co-author. No big deal, but just wanted to mention it because it's a cool feature to use.

@RichardScottOZ
Copy link
Contributor

For a 2D case anyway, could use rioxarray possibly to utilise this?

@santisoler
Copy link
Member Author

Looks great! Mostly over-my-head, but I read through what I could understand, and tested with a whole bunch of grids I had, all of which worked great. I noticed one parameter of one function didn't have a docstring, so I added a comment for that.

That sounds great! Thanks for the effort of field testing this function.

One additional question/comment. What does the map_projection attribute mean? Most of my geosoft grids have a EPSG (3031) code defining their projection, but this doesn't seem to be preserved in the dataarray attributes. The attribute map_projection for all the grids just has a value of 0.

I noticed it before. The function is just reading the value from the header and it doesn't seem to be always related to the actual projection of the grid. I left it because it wasn't a problem to include it and sounds better to have it than loosing information while reading the file.

I noticed that the associated .xml files for my grids contain the EPSG codes. I wrote the below function to pull the code out. Would it be helpful to have this as part of the load_oasis_montaj_grid function?

def extract_proj_str(fname):
    with open(fname, "r") as f:
        for line in f:
            if "projection=" in line:
                # print lines with projection info
                print(line)

                # extract projection string
                proj = line.split('projection="')[1].split('" ')[0]
                print(proj)

                # remove non-alphanumeric characters if present
                if proj.isalnum() is False:
                    proj = ''.join(filter(str.isalnum, proj))

                assert proj.isalnum
    try:
        proj
    except:
        raise NameError("string 'projection=' not found in file.")
        proj = None

    return proj
fname = "example.grd"
proj = extract_proj_str(fname+".xml")

Here is the portion of the .xml with the EPSG code:

<projection type="POLAR_STEREOGRAPHIC" name="WGS 84 / *EPSG3031" projection="*EPSG3031" ellipsoid="WGS 84" datum="WGS 84" datumtrf="WGS 84" datumtrf_description="[WGS 84] World" radius="6378137" eccentricity="0.0818191908426215">

Thanks for this! I haven't got into the .xml file because I didn't find a reason, but the projection is something important to have in the generated DataArray.

I think we should totally include this into our function. But I'm hesitant to do it in this PR: it already has quite a bunch of commits and I would like to see it merged for the next release.

So I extend the invitation to create a new PR including the capabilities of reading the xml file. Go for it!

@santisoler
Copy link
Member Author

For a 2D case anyway, could use rioxarray possibly to utilise this?

I assume that you are talking about the projection codes, right @RichardScottOZ ? If so, yes, you can use rioxarray to apply projections. You can also use pyproj to define the projection and the verde.project_grid function as an alternative. Checkout this example: Projection of gridded data

@santisoler
Copy link
Member Author

Since we had a couple of positive reviews thanks to @LL-Geo and @mdtanker I'm merging this 🚀 . We have included some untested lines with this PR, but they are not critical. If anyone runs into a problem we can always solve it with more information on what is failing.

Thanks again to everyone that contributed to this PR! I'd love to see more capabilities to it added in the future like the projection information that @mdtanker mentioned.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants