Merge pull requestexplosion#602from danieldk/update-develop #1

shadeMe · 2022-04-08T14:47:43Z

Sync develop with master

@svlandeg

* Make NumpyOps CPU kernels generic This PR makes most CPU kernels generic, so that they can take both float32 and float64 arrays (and hopefully in the future float16). I experimented with kernels in Cython + fused types and kernels as C++ with templates, I found the C++ template route more promising: - More compact/ergonomic implementations with fewer compile-time conditionals. - Opens up the possibility to easily use SIMD intrinsics in the future. To allow genericity in the NumpyOps method arguments, we use: - Fused types when we require a specific dimensionality; - np.ndarray otherwise. Some of the kernels are not made generic: - cpu_scatter_add: needs tests to verify that the op still works correctly. - cpu_position_encode: the position_encode op doesn't take float array(s). - lstm kernels: I need to look more deeply into them. * Include C++ headers in sdist * NumpyOps: Use workaround for cython/cython#4697 * Namespace-qualify memcpy * ReLU kernel: never output -0.0 * Add fixes suggested by @svlandeg

shadeMe merged commit bd75c9e into shadeMe:develop Apr 8, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge pull requestexplosion#602from danieldk/update-develop #1

Merge pull requestexplosion#602from danieldk/update-develop #1

shadeMe commented Apr 8, 2022

Merge pull requestexplosion#602from danieldk/update-develop #1

Merge pull requestexplosion#602from danieldk/update-develop #1

Conversation

shadeMe commented Apr 8, 2022