Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merge pull requestexplosion#602from danieldk/update-develop #1

Merged
merged 1 commit into from
Apr 8, 2022

Conversation

shadeMe
Copy link
Owner

@shadeMe shadeMe commented Apr 8, 2022

Sync develop with master

* Make NumpyOps CPU kernels generic

This PR makes most CPU kernels generic, so that they can take both
float32 and float64 arrays (and hopefully in the future float16). I
experimented with kernels in Cython + fused types and kernels as C++
with templates, I found the C++ template route more promising:

- More compact/ergonomic implementations with fewer compile-time
  conditionals.
- Opens up the possibility to easily use SIMD intrinsics in the
  future.

To allow genericity in the NumpyOps method arguments, we use:

- Fused types when we require a specific dimensionality;
- np.ndarray otherwise.

Some of the kernels are not made generic:

- cpu_scatter_add: needs tests to verify that the op still works
  correctly.
- cpu_position_encode: the position_encode op doesn't take float
  array(s).
- lstm kernels: I need to look more deeply into them.

* Include C++ headers in sdist

* NumpyOps: Use workaround for cython/cython#4697

* Namespace-qualify memcpy

* ReLU kernel: never output -0.0

* Add fixes suggested by @svlandeg
@shadeMe shadeMe merged commit bd75c9e into shadeMe:develop Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants