Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NNPOps 0.6 #107

Closed
raimis opened this issue Jul 24, 2023 · 15 comments
Closed

NNPOps 0.6 #107

raimis opened this issue Jul 24, 2023 · 15 comments
Assignees

Comments

@raimis
Copy link
Contributor

raimis commented Jul 24, 2023

Let's start the new release.

New features:

Testing improvements:

@raimis raimis self-assigned this Jul 24, 2023
@raimis
Copy link
Contributor Author

raimis commented Jul 24, 2023

@RaulPPelaez any change to finish #105?

@raimis
Copy link
Contributor Author

raimis commented Jul 24, 2023

Does anybody want to implement/finish something more?

Ping: @peastman

@RaulPPelaez
Copy link
Contributor

@RaulPPelaez any change to finish #105?

It defeated me tbh. I do not think it is worth stopping the release for it. I will give it another spin.

@raimis
Copy link
Contributor Author

raimis commented Jul 24, 2023

The test on the GPU don't pass: https:/openmm/NNPOps/actions/runs/5645655178

@RaulPPelaez
Copy link
Contributor

I think both failures could be attributed to the inaccuracy introduced by the non-deterministic order of operations in the CUDA versions.
The first test yields an error of 0.0343 (tolerance is 0.025) after doing a sum operation.
The second has a relative error tolerance of 1e-7, which is already a bit too much for single precision

@peastman
Copy link
Member

Sounds good to me.

@mikemhenry
Copy link
Collaborator

Do we want to update the tests to make them less strict?

@RaulPPelaez
Copy link
Contributor

@RaulPPelaez
Copy link
Contributor

RaulPPelaez commented Jul 25, 2023

Both tests pass locally in a 4090 at the current master 5e2438d , perhaps the CI was just a fluke?
EDIT: Also tried on a GPU with arch 75 (as the one in the CI) just in case and one of the BatchedNN tests failed also because some tolerance.

        assert energy_error < 5e-7
        if molFile == '3o99':
            assert grad_error < 0.05 # Some numerical instability
        else:
>           assert grad_error < 5e-3
E           AssertionError: assert tensor(0.0071, device='cuda:0') < 0.005

I do not think this is due to a bug, I would blame whatever kernel torch uses for sm_75 being a tad more inaccurate.
Using deterministic algorithms does indeed make the error go away. I opened #108 to solve try in the CI.

@mikemhenry
Copy link
Collaborator

Okay so now that #108 is merged are we ready for a new release? There have been some breaking but very good changes in conda-forge packaging for CUDA conda-forge/conda-forge.github.io#1963 so it might take a bit to get that sorted. I will get the release on conda-forge started, but will wait to actually merge it in until we decide if the release is ready.

@RaulPPelaez
Copy link
Contributor

The changes should not affect CUDA<12 builds, right?
AFAIK the new packages are only for CUDA >=12.
Anyhow I say we are ready to release v0.6.
cc @raimis, could you please?.

@raimis
Copy link
Contributor Author

raimis commented Jul 26, 2023

OK! I'm making the release

@raimis
Copy link
Contributor Author

raimis commented Jul 26, 2023

We have a release: https:/openmm/NNPOps/releases/tag/v0.6

@raimis
Copy link
Contributor Author

raimis commented Jul 26, 2023

@mikemhenry you can proceed with conda-forge/nnpops-feedstock#24. If CUDA 12 is an issue, we can build just for CUDA 11. And then sort out CUDA 12.

@raimis
Copy link
Contributor Author

raimis commented Jul 27, 2023

NNPOps 0.6 is out!

@jchodera could you make a tweet (or whatever is it called now)?

@raimis raimis closed this as completed Aug 18, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants