Improve perf of `_set_continuous!` & `_set_discrete!` #25

albert-de-montserrat · 2024-06-18T10:03:26Z

Adds a Vararg signature at the end of _set_continuous! and _set_discrete! so that the compiler can specialize.

@kernel inbounds = true function _set_continuous!(dst, grid, loc, fun::F, args::Vararg{Any, N}) where {F, N}
    I = @index(Global, NTuple)
    dst[I...] = fun(coord(grid, loc, I...)..., args...)
end

@kernel inbounds = true function _set_discrete!(dst, grid, loc, fun::F, args::Vararg{Any, N}) where {F, N}
    I = @index(Global, NTuple)
    dst[I...] = fun(grid, loc, I..., args...)
end

Example:

using Chmy, Chmy.Architectures, Chmy.Grids, Chmy.Fields, Chmy.BoundaryConditions, Chmy.GridOperators, Chmy.KernelLaunch

backend=CPU(); nxy=(126, 126)
arch = Arch(backend)
# geometry
grid   = UniformGrid(arch; origin=(-1, -1), extent=(2, 2), dims=nxy)
launch = Launcher(arch, grid; outer_width=(16, 8))
allocate fields
C = Field(backend, grid, Center())
init_incl(x, y, x0, y0, r, in, out) = ifelse((x - x0)^2 + (y - y0)^2 < r^2, in, out)

#main:

using Chairmarks
julia> @b set!($(C, grid, init_incl)...; parameters=(x0=0.0, y0=0.0, r=0.1, in=1, out=0))
705.100 μs (112084 allocs: 1.980 MiB)

This PR

julia> @b set!($(C, grid, init_incl)...; parameters=(x0=0.0, y0=0.0, r=0.1, in=1, out=0))
8.650 μs (70 allocs: 7.266 KiB)

luraess · 2024-06-18T11:00:41Z

@utkinis can you x-check. LGTM

youwuyou · 2024-06-18T13:37:52Z

Looks cool! How is the performance boost achieved? I think I haven't fully comprehended how compiler optimizes it here such that we have less memory allocations requied. @albert-de-montserrat

albert-de-montserrat · 2024-06-18T13:59:11Z

It's briefly explained here. Tbh it's usually not that bad, I am surprised to see so many allocations. But the profiler shows some run time dispatch that I can't really pin it down with @descend, the macro does not go really deep into KA kernels.

I guess this is just good practice to avoid surprising perf drops anyway. Same as when passing functions around

utkinis · 2024-06-18T14:22:35Z

Thanks @albert-de-montserrat for the fix! I always keep forgetting about these specialisation rules =)

Interestingly, on GPU this code doesn't result in unstable code, otherwise it wouldn't compile, I wonder if the GPU specialisation rules are more strict...

Looks good to me, will merge as soon as CI finishes

Improve perf of _set_continuous! & _set_discrete!

de43270

Bump version

d611b78

utkinis merged commit 8bd81f7 into PTsolvers:main Jun 18, 2024
9 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve perf of `_set_continuous!` & `_set_discrete!` #25

Improve perf of `_set_continuous!` & `_set_discrete!` #25

albert-de-montserrat commented Jun 18, 2024

luraess commented Jun 18, 2024

youwuyou commented Jun 18, 2024

albert-de-montserrat commented Jun 18, 2024 •

edited

Loading

utkinis commented Jun 18, 2024

Improve perf of _set_continuous! & _set_discrete! #25

Improve perf of _set_continuous! & _set_discrete! #25

Conversation

albert-de-montserrat commented Jun 18, 2024

luraess commented Jun 18, 2024

youwuyou commented Jun 18, 2024

albert-de-montserrat commented Jun 18, 2024 • edited Loading

utkinis commented Jun 18, 2024

Improve perf of `_set_continuous!` & `_set_discrete!` #25

Improve perf of `_set_continuous!` & `_set_discrete!` #25

albert-de-montserrat commented Jun 18, 2024 •

edited

Loading