Skip to content
This repository has been archived by the owner on Jul 22, 2024. It is now read-only.

Major slowdowns with Gtk + Threads #503

Closed
timholy opened this issue Apr 22, 2020 · 21 comments · Fixed by #630
Closed

Major slowdowns with Gtk + Threads #503

timholy opened this issue Apr 22, 2020 · 21 comments · Fixed by #630

Comments

@timholy
Copy link
Member

timholy commented Apr 22, 2020

This may be the same issue as #325, but observed on Linux with threads. I should say I'm running this on a 6-physical-core machine (12 if you include hyperthreads). Copied from JuliaImages/ImageFiltering.jl#161 and specifically the test in timholy/ComputationalResources.jl#18:

$ JULIA_NUM_THREADS=4 julia
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.4.2-pre.0 (2020-04-15)
 _/ |\__'_|_|_|\__'_|  |  release-1.4/ef4fe83698* (fork: 122 commits, 113 days)
|__/                   |

julia> using ImageFiltering, BenchmarkTools, ComputationalResources

julia> function test_imfilter(; opt=1)
               gk = Kernel.gaussian(2.0f0)
               I = rand(Float32,1000 ,1000)
               println("julia_num_threads=$(Threads.nthreads())")
               
               for i = 1:9
                       start = time()
                       if opt == 1
                               J = imfilter(CPU1(Algorithm.FIR()), I, gk)
                       elseif opt == 2
                               J = imfilter(I, gk)
                       else
                               error("Wrong opt value $opt, either 1 (for CPU1), 2 CPUThreads")
                       end
                       t = round(time() - start, digits=6)
                       println("$i imfilter $t sec")
               end
       end
test_imfilter (generic function with 1 method)

julia> test_imfilter(opt=1)
julia_num_threads=4
1 imfilter 0.020989 sec
2 imfilter 0.022114 sec
3 imfilter 0.035968 sec
4 imfilter 0.020871 sec
5 imfilter 0.020997 sec
6 imfilter 0.021044 sec
7 imfilter 0.024659 sec
8 imfilter 0.018204 sec
9 imfilter 0.018081 sec

julia> test_imfilter(opt=2)
julia_num_threads=4
1 imfilter 0.320837 sec
2 imfilter 0.015825 sec
3 imfilter 0.018062 sec
4 imfilter 0.008963 sec
5 imfilter 0.009365 sec
6 imfilter 0.010428 sec
7 imfilter 0.009566 sec
8 imfilter 0.010779 sec
9 imfilter 0.013365 sec

All seems well. Now load Gtk and do it again:

julia> using Gtk
Gtk-Message: 09:16:09.296: Failed to load module "canberra-gtk-module"
Gtk-Message: 09:16:09.296: Failed to load module "canberra-gtk-module"

julia> test_imfilter(opt=1)
julia_num_threads=4
1 imfilter 0.020802 sec
2 imfilter 0.019717 sec
3 imfilter 0.021034 sec
4 imfilter 0.02322 sec
5 imfilter 0.017263 sec
6 imfilter 0.020025 sec
7 imfilter 0.018188 sec
8 imfilter 0.021637 sec
9 imfilter 0.017454 sec

julia> test_imfilter(opt=2)
julia_num_threads=4
1 imfilter 5.036473 sec
2 imfilter 5.020005 sec
3 imfilter 5.04309 sec
4 imfilter 0.013518 sec
5 imfilter 5.01333 sec
6 imfilter 0.032066 sec
7 imfilter 5.016917 sec
8 imfilter 5.035067 sec
9 imfilter 5.024456 sec

You can see that the multithreaded case specifically gets massively slowed down. Here are some key details:

julia> versioninfo()
Julia Version 1.4.2-pre.0
Commit ef4fe83698* (2020-04-15 16:24 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 4
  JULIA_CPU_THREADS = 4

(@v1.4) pkg> st Gtk
Status `~/.julia/environments/v1.4/Project.toml`
  [4c0ca9eb] Gtk v1.1.3
@timholy
Copy link
Member Author

timholy commented Apr 22, 2020

Interestingly, I see this even on Julia 1.2, so it's not a consequence of the overhaul in threading for Julia 1.3. However, Julia 1.0.5 seems immune.

@tknopp
Copy link
Collaborator

tknopp commented Apr 22, 2020

How is the multithreading in imfilter implemented? Does each thread get the same amount of work? From my perspective it ist not really surprising what is happening. Gtk grabs the first CPU and thus takes most of the resources from that CPU1. I would expect that you get the speedup if you exclude the first CPU.

That Gtk.jl is resource hungry on the first CPU is another problem. If I get it right, we try to integrate the Gtk mainloop with the Julia task machinery. And that integration seems to be suboptimal. I once made a post on this in the discourse dev forum.

@tknopp
Copy link
Collaborator

tknopp commented Apr 22, 2020

Here is that discourse topic, where I wanted to start this discussion: https://discourse.julialang.org/t/repl-and-mainloop/35699
I think the issue is not only the fault of Gtk.jl but if you have two main loops it is somewhat unclear (at least to me) how the should behave. For instance of the "time slices" that are spend in Gtk are too long, the Julia main loop cannot on concurrent tasks.

@timholy
Copy link
Member Author

timholy commented Apr 22, 2020

Sorry we didn't get any discussion in that discourse post. I haven't looked into this deeply myself.

I can answer how the multithreading in imfilter is implemented: it uses "old school threads," see here. There doesn't seem to be a way to exclude a particular thread in Threads.@threads, so it doesn't look like there's an easy workaround.

@tknopp
Copy link
Collaborator

tknopp commented Apr 22, 2020

No problem, I am not in the position of solving this problem either. My feeling is that we are scratching real world problems that should actually be discussed and solved in Julia base. I see two somewhat independent problem:

  • We need the notation of master / worker threads. The Threads.@threads should never run on the master thread. With that simple trick, we might already solve your issue.
  • The second is how the master thread integrates with main loops and how it tries not to make the REPL "sluggish" and so on. One idea could be that the REPL does always run its code in a worker thread and keeps the master thread clean for "main-loop" related tasks. Gtk.jl could then hook into this master thread, since the drawing needs to be done on this thread anyway.

Whats you impression, what is the best forum to discuss this?

@timholy
Copy link
Member Author

timholy commented Apr 24, 2020

Sometimes fortune smiles on you (aka, Jeff's on the case): JuliaLang/julia#35552.

@tknopp
Copy link
Collaborator

tknopp commented Jun 27, 2020

@timholy:

There doesn't seem to be a way to exclude a particular thread in Threads.@threads, so it doesn't look like there's an easy workaround.

The following might be the answer to this particular issue:

https://tro3.github.io/ThreadPools.jl/build/index.html#ThreadPools.@bthreads

@StefanMathis
Copy link

I wasn't sure whether I should open a new thread or use this one, since I have the same problem and would like to give an answer to @tknopp's sugestion:

I tested with the following MWE:

using Gtk
using ThreadPools

main() = loop(Vector{Float64}(undef, 100_000))
function loop(arr)
    for k in 1:length(arr)
        k_float = float(k)
        arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
    end
end

mainthreaded() = loopthreaded(Vector{Float64}(undef, 100_000))
function loopthreaded(arr)
    ThreadPools.@bthreads for k in 1:length(arr)
        k_float = float(k)
        arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
    end
end

println("===============")
@time main()
@time mainthreaded()
@time main()
@time mainthreaded()

This results in the following output:

===============
  0.004046 seconds (2 allocations: 781.328 KiB)
  0.040130 seconds (107.68 k allocations: 11.095 MiB)
  0.004341 seconds (2 allocations: 781.328 KiB)
  0.024947 seconds (100.06 k allocations: 10.686 MiB)

So unfortunately ThreadPools.jl doesn't solve the issue.

In JuliaLang/julia#35552, it is suggested that the following line in Gtk.jl/src/GLib/signals.jl could be the culprit:

tmout_min::Cint = (uv_pollfd::_GPollFD).fd == -1 ? 10 : 5000

Changing the 5000 to 1 and voilà:

===============
  0.004263 seconds (2 allocations: 781.328 KiB)
  0.041389 seconds (107.68 k allocations: 11.095 MiB)
  0.004720 seconds (2 allocations: 781.328 KiB)
  0.025503 seconds (100.06 k allocations: 10.686 MiB)

So the problem still persists. Interestingly enough, LoopVectorization.jl is not affected.

@tknopp
Copy link
Collaborator

tknopp commented Sep 19, 2021

LoopVectorization.jl is not about threading but for exploiting SIMD operations, so it is clear that it is not affected.

Regarding my suggestion it was rather a hope from me that this solves the problem. We actually see similar problems and I really wish we could do something about it.

My theory right now is the following. If you run the code serially (i.e. on the first thread) the gtk main loop will be stopped completely for the moment and will not affect the serial code. If you run those in parallel probably the main thread (doing Gtk work) will regularly invoke GC which is(?) still a serial operation and will stop the other threads?

@tro3, @JeffBezanson, @vtjnash: Has anybody of you an idea if

  1. My assessment is correct here?
  2. If this is rather a Julia threading issue or that Gtk simply uses Julia tasks in a bad way?
  3. Any idea how to resolve it?

@StefanMathis
Copy link

The line regarding LoopVectorization.jl was with respect to the threaded version ;-) Anyway, thank you very much for the effort you put into solving this issue.

@StefanMathis
Copy link

StefanMathis commented Oct 28, 2021

I'm not sure whether this is considered trivial, but I found a somewhat "dirty" solution which works for me at the moment, since I don't need the Gtk main loop when doing intense parallel computing. Here is a MWE:

using Gtk
GLib = Gtk.GLib

# Terminate the Gtk main loop
function gtk_quit_main_loop()
    if ccall((:g_main_depth, GLib.libglib), Cint, ()) > 0
        Gtk.gtk_quit()
    end
end

function loop!(arr::Vector{Float64})
    for k in 1:length(arr)
        k_float = float(k)
        arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
    end
end

function loopthreaded!(arr::Vector{Float64})
    Threads.@threads for k in 1:length(arr)
        k_float = float(k)
        arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
    end
end

function create_window()

    win = GtkWindow("My First Gtk.jl Program", 400, 200)

    b = GtkButton("Click Me")
    push!(win,b)

    function on_button_clicked(w)
      println("The button has been clicked")
    end
    signal_connect(on_button_clicked, b, "clicked")

    # Now reinstantiate the Gtk main loop
    Gtk.__init__()
    showall(win)

    # Wait with programm progression until the window is closed
    c = Condition()
    signal_connect(win, :destroy) do widget
        notify(c)
    end
    wait(c)

    # NOTE
    # This function terminates the Gtk main loop, if one is active.
    # Comment this out to see the massive slowdown due to the Gtk main loop
    gtk_quit_main_loop()
end

# Program execution
# ==============================================================================
create_window() # First GUI call

arr = Vector{Float64}(undef, 100_000)
arrthreaded = Vector{Float64}(undef, 100_000)

# Compilation
loop!(Vector{Float64}(undef, 2))
loopthreaded!(Vector{Float64}(undef, 2))

# Time measurement
@time loop!(arr)
@time loopthreaded!(arrthreaded)

println("===============")

create_window() # Second GUI call

I'm fully aware that this is as hacky as it gets, but at least for some use cases this should allow the use of Gtk and multithreading for performance in the same environment. Of course, this does nothing for you if you want to trigger a multithreaded calculation from a Gtk GUI :-( Anything potentially dangerous about this solution?

@tknopp
Copy link
Collaborator

tknopp commented Oct 28, 2021

I don't see anything dangerous about that. Interesting would be if it also would be possible to pause the main loop, in which case one could keep the UI open (it would just freeze during the calculation).

@StefanMathis
Copy link

I did a test where I quit the Gtk main loop while a GUI was open. This leads to a crash. Therefore, unfortunately, your suggestion doesn't work. Is there a way to "pause" the Gtk main loop (w/o quitting it)?

@StefanMathis
Copy link

StefanMathis commented Nov 9, 2021

One thing that left me puzzled was the fact that LoopVectorization.jl in its threaded variant is not negatively affected by the Gtk main loop. I looked into this and found that according to my understanding, this package actually doesn't use the standard Julia threads, but the thread implementation in Polyester.jl. Applying this package to my benchmark above yields:

using Gtk
using Polyester
using BenchmarkTools

function loop!(arr::Vector{Float64})
    for k in 1:length(arr)
        k_float = float(k)
        arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
    end
end

function loopthreaded!(arr::Vector{Float64})
    Threads.@threads for k in 1:length(arr)
        k_float = float(k)
        arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
    end
end

function looppolyester!(arr::Vector{Float64})
    @batch per=core for k in 1:length(arr)
        k_float = float(k)
        arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
    end
end

function create_window()

    win = GtkWindow("My First Gtk.jl Program", 400, 200)

    b = GtkButton("Click Me")
    push!(win,b)

    function on_button_clicked(w)
      println("The button has been clicked")
    end
    signal_connect(on_button_clicked, b, "clicked")

    # Now reinstantiate the Gtk main loop
    showall(win)

    # Wait with programm progression until the window is closed
    c = Condition()
    signal_connect(win, :destroy) do widget
        notify(c)
    end
    wait(c)
end

# Program execution
# ==============================================================================
create_window() # First GUI call

arr = Vector{Float64}(undef, 100_000)
arrthreaded = copy(arr)
polthreaded = copy(arr)

# Compilation
loop!(Vector{Float64}(undef, 2))
loopthreaded!(Vector{Float64}(undef, 2))

# Time measurement
benchmark_serial = @benchmark loop!(arr)
benchmark_threads = @benchmark loopthreaded!(arrthreaded)
benchmark_polyester = @benchmark looppolyester!(polthreaded)

println("===============")

create_window() # Second GUI call

Compared to my last example, you will notice that the Gtk main loop isn't deactivated anymore, which leads to the performance regression discussed before for the Julia threads:

julia> benchmark_serial
BenchmarkTools.Trial: 1275 samples with 1 evaluation.
 Range (min  max):  3.651 ms    6.428 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     3.855 ms               ┊ GC (median):    0.00%        
 Time  (mean ± σ):   3.916 ms ± 223.838 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

       █   ▁▁   ▃
  ▂▄▂▄▃█▃▃▃██▃▃▃█▃▃▃▄▄▃▃▃▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▂▁▁▂▁▂▁▂▁▂▁▂▂▂▁▁▂▂▁▂ ▃
  3.65 ms         Histogram: frequency by time        4.85 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> benchmark_threads
BenchmarkTools.Trial: 319 samples with 1 evaluation.
 Range (min  max):   1.118 ms  39.897 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     15.539 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   15.664 ms ±  5.143 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

                    ▁▂▂       █▃            ▁
  ▇▅▁▁▁▄█▁▁▁▁▁▁▁▁▁▁▇███▆▄▇▇▄▅▆██▆▁▇▇▅▄▆▄▆▇▅▇█▄▅▅▅▁▄▅▄▄▅▅▁▅▁▁▄ ▆
  1.12 ms      Histogram: log(frequency) by time      30.8 ms <

 Memory estimate: 3.12 KiB, allocs estimate: 31.

However, the loop created with the Polyester.jl threads doesn't care:

julia> benchmark_polyester
BenchmarkTools.Trial: 5136 samples with 1 evaluation.
 Range (min  max):  735.900 μs    1.464 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):       1.003 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   968.403 μs ± 120.804 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                              ▂  ▃  ▂ ▂
  █▃▃▃▃▃▂▂▂▂▂▂▂▂▂▂▁▂▂▂▂▂▁▅▂▂▃▅▄▃▅█▇██▄▃███▅▄▃▃▄▃▃▂▃▂▃▂▂▂▂▂▂▂▂▂▂ ▃
  736 μs           Histogram: frequency by time         1.22 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

@chriselrod: Any idea why this is the case?

The good news are that this allows the combination of an active GUI and multi-threaded Polyester.jl loops with excellent performance :-) Also the fiddling with the Gtk main loop is not necessary anymore, which avoids a source of potential bugs.

@chriselrod
Copy link

chriselrod commented Nov 9, 2021

Polyester uses a very simple static schedule, so there's a lot less that can go wrong than with Threads.@threads or Threads.@spawn.
Basically, it runs your work on already existing tasks. If you're running a lot of work with only short breaks between threaded regions (<1ms), these tasks can be kept spinning, which will bypass most of the Julia related scheduling code.

As you noted, the same applies to @tturbo / threaded LoopVectorization.

From the sound of things, the specific issue is that GTK tries to run a task on the main thread?
In this case, Polyester/LV probably solve the problem by simply blocking it while executing.
Here, the main thread launches the worker threads, and then immediately begins working on its own chunk rather than yielding, not giving GTK the opportunity to jump in.

@StefanMathis
Copy link

I'm not sure that the Gtk task is actually run on the main thread, otherwise the @bthreads macro from ThreadPools should not be affected. So therefore I believe the performance problem is actually caused by either the GC issue which @tknopp described or because the Gtk main loop is launched on some background thread and then fights over resources with a task launched by the @threads macro.

One thing I'd like to test in the next days is to force the Gtk main loop task to the main thread via @spawnat and then checking if the @bthreads macro shows no performance regression anymore. This could be useful in cases where one would like to keep the GUI responsive while a multi-threaded loop is running. As @chriselrod pointed out, when using the Polyester threads, the Gtk main loop is effectively blocked during the multi-thread loop calculation.

@tknopp
Copy link
Collaborator

tknopp commented Nov 17, 2021

Great that you are working on this @StefanMathis. And yes, ideally we want a responsible UI but no performance degradations by the UI thread. Would be great if we reach that. Besides number crunching in my research group we have also a use case where the background task is gathering data from a device in real-time. There we also see better performance (-> latency) without UI, which we would like to resolve.

@StefanMathis
Copy link

I finally found some time to test a threaded loop with ThreadPools.jl while forcing the Gtk main loop to the primary thread.

Below you find the code for a benchmark w/o the Gtk main loop active. You can clearly see that the normal threads perform the best, then the serial version, and then at last the version from ThreadPools.jl. The @bthreads loop does show a lot of allocations, so I assume this is the reason why it comes last.

using Gtk
using ThreadPools
using BenchmarkTools
GLib = Gtk.GLib

# Terminate the Gtk main loop
function gtk_quit_main_loop()
    if ccall((:g_main_depth, GLib.libglib), Cint, ()) > 0
        Gtk.gtk_quit()
    end
end

function loop!(arr::Vector{Float64})
    for k in 1:length(arr)
        k_float = float(k)
        arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
    end
end

function loop_t!(arr::Vector{Float64})
    Threads.@threads for k in 1:length(arr)
        k_float = float(k)
        arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
    end
end

function loop_bt!(arr::Vector{Float64})
    ThreadPools.@bthreads for k in 1:length(arr)
        k_float = float(k)
        arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
    end
end


# Program execution
# ==============================================================================

# Start the Gtk main loop on the primary thread
gtk_quit_main_loop()
# @tspawnat 1 Gtk.gtk_main() # Comment this in to test with the Gtk main loop active

arr = Vector{Float64}(undef, 100_000)
arr_t = copy(arr)
arr_bt = copy(arr)

# Compilation
loop!(Vector{Float64}(undef, 2))
loop_t!(Vector{Float64}(undef, 2))
loop_bt!(Vector{Float64}(undef, 2))

GC.enable(false)

# Time measurement
benchmark_serial = @benchmark loop!(arr)
benchmark_t = @benchmark loop_t!(arr_t)
benchmark_bt = @benchmark loop_bt!(arr_bt)

GC.enable(true)
julia> benchmark_t
BenchmarkTools.Trial: 2326 samples with 1 evaluation.
 Range (min  max):  1.554 ms    2.610 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     2.109 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   2.143 ms ± 143.424 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                 ▃▂  ▁█▅▃▁       ▃▁    ▄       
  ▂▁▁▁▁▁▁▁▁▁▁▁▁▁▂▂▂▂▂▃▃▃▃▄▃▄▅▆▆▅▅██▇██████▅▄▅▅▄▄▅██▅█▇▆█▆▇▆▄▃ ▄
  1.55 ms         Histogram: frequency by time        2.41 ms <

 Memory estimate: 2.11 KiB, allocs estimate: 21.

julia> benchmark_serial
BenchmarkTools.Trial: 1019 samples with 1 evaluation.
 Range (min  max):  4.032 ms   14.553 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     4.688 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.890 ms ± 723.126 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

         █▂▁▁▁   
  ▃▃▃▆▄▅▅██████▆▆▅▅▄▄▃▃▃▄▃▃▃▃▃▃▃▂▃▃▂▂▃▃▂▂▃▁▂▂▂▂▂▂▂▁▂▂▂▂▂▁▁▁▁▂ ▃
  4.03 ms         Histogram: frequency by time        7.56 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.


julia> benchmark_bt
BenchmarkTools.Trial: 348 samples with 1 evaluation.
 Range (min  max):   6.469 ms    1.034 s  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):      8.391 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   14.362 ms ± 55.042 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

      ▅█                   
  ▂▂▄███▅▄▃▃▁▁▁▂▁▃▁▁▂▂▃▃▃▃▃▄▃▃▃▃▃▃▃▂▃▃▃▃▃▃▂▁▂▂▁▂▂▁▁▁▁▁▁▂▂▁▂▁▂ ▃
  6.47 ms         Histogram: frequency by time        26.2 ms <

 Memory estimate: 9.92 MiB, allocs estimate: 100064.

When spawning the Gtk main loop (i.e. inserting the line @tspawnat 1 Gtk.gtk_main() behind gtk_quit_main_loop(), the following results can be observed:

julia> benchmark_t
BenchmarkTools.Trial: 364 samples with 1 evaluation.
 Range (min  max):   1.760 ms  28.420 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     15.177 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   13.717 ms ±  5.107 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

                             ▂    ▁█▆▂  ▂
  ▅▃▄▅▅▅▅▄▂▁▁▁▁▁▁▁▁▁▂▁▁▂▅▃▆▇▆█▅▇▆█████▆▆█▅▃▅▄▁▁▁▁▂▁▂▂▃▁▁▂▂▁▁▃ ▃
  1.76 ms         Histogram: frequency by time        25.9 ms <

 Memory estimate: 2.11 KiB, allocs estimate: 21.

julia> benchmark_serial
BenchmarkTools.Trial: 1048 samples with 1 evaluation.
 Range (min  max):  4.005 ms    9.243 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     4.578 ms               ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.757 ms ± 663.124 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

   ▂ ▃▃▃▂▄█▇▅▃▁ 
  ▄█████████████▆▆▆▅▅▅▆▅▃▄▃▃▃▃▃▃▃▃▃▂▃▃▂▂▂▂▂▂▂▃▃▂▂▂▂▂▂▂▂▁▁▂▁▂▂ ▄
  4.01 ms         Histogram: frequency by time        7.45 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> benchmark_bt
BenchmarkTools.Trial: 328 samples with 1 evaluation.
 Range (min  max):  11.393 ms  22.695 ms  ┊ GC (min  max): 0.00%  0.00%
 Time  (median):     15.633 ms              ┊ GC (median):    0.00%
 Time  (mean ± σ):   15.268 ms ±  1.631 ms  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                      █▃
  ▂▂▃▂▅▄▃▃▂▄▄▄▄▃▄▃▂▃▃▃▄▄▃▃▄▄▃▄▅▅▆▆▄▆▅▅███▃▃▅▅▃▄▃▂▃▂▃▁▃▃▃▃▃▂▂▃ ▃
  11.4 ms         Histogram: frequency by time        18.8 ms <

 Memory estimate: 9.92 MiB, allocs estimate: 100063.

Now the threaded and the ThreadPools version are on roughly the same footing. In fact, the ThreadPool version is actually hardly affected by the Gtk main loop at all, but because it performs so much worse than the serial version it is still not a good idea to use it here.

Please keep in mind that I really don't want to bash ThreadPools, I think it is rather likely that I did something stupid and therefore my results are inaccurate. If you spot something fishy, do not hesitate to point it out :-)

@IanButterworth
Copy link
Collaborator

Proposing a bandaid here #607

@StefanMathis
Copy link

I just found some time to apply my benchmark from above to the solution of @IanButterworth and I am very pleased to say that his PR really solves the problem 👍

I found one quirk to be aware of: The Gtk main task takes some time to terminate. When a multithreaded loop is started while the Gtk main task is still active, the performance is still thrashed (as to be expected). The following example demonstrates this:

using Gtk
using BenchmarkTools

function loop!(arr::Vector{Float64})
    for k in 1:length(arr)
        k_float = float(k)
        arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
    end
end

function loopthreaded!(arr::Vector{Float64})
    Threads.@threads for k in 1:length(arr)
        k_float = float(k)
        arr[k] = sin(k_float)*cos(k_float)*tan(k_float)/sqrt(k_float) # Just some calculation to sink time into
    end
end

function create_window()

    win = GtkWindow("My First Gtk.jl Program", 400, 200)

    b = GtkButton("Close me with the x-button in the top right corner.")
    push!(win,b)

    function on_button_clicked(w)
      println("The button has been clicked")
    end
    signal_connect(on_button_clicked, b, "clicked")

    # Now reinstantiate the Gtk main loop
    showall(win)

    # Wait with programm progression until the window is closed
    c = Condition()
    signal_connect(win, :destroy) do widget
        notify(c)
    end
    wait(c)

    # NOTE
    # This function terminates the Gtk main loop, if one is active.
    # Comment this out to see the massive slowdown due to the Gtk main loop
    #gtk_quit_main_loop()
end

# Program execution
# ==============================================================================

arr = Vector{Float64}(undef, 100_000)
arrthreaded = Vector{Float64}(undef, 100_000)

# Compilation
loop!(Vector{Float64}(undef, 2))
loopthreaded!(Vector{Float64}(undef, 2))

# Comment this in or out to see the effect of the Gtk main loop.
create_window()

# Occasionally, the gtk main task may still run when entering the benchmark loop.
# In this case, the performance is as bad as before. A sleep command of 300 ms
# is usually sufficient to achieve task termination before proceeding to the loops.
# sleep(0.3)

is_running = Gtk.gtk_main_running[]
println("Gtk main task running: $is_running")

# When measuring only a single loop call, the Gtk main task may still be running,
# therefore the performance of the multithreaded call may be far worse than the
# single-threaded call
@time loopthreaded!(arrthreaded)
@time loop!(arr)

# Proper measurement with multiple function calls shows that the multithreaded
# version is indeed faster.
benchmark_threads = @benchmark loopthreaded!(arrthreaded)
benchmark_serial = @benchmark loop!(arr)

println("===============")

When executing this code snippet, you'll see that the single-call benchmarking via @time may lead to bad results for the multithreaded loop, while the proper measurement with @benchmark shows the desired performance increase regardless of prior Gtk activity. So be careful when benchmarking ;-)

@IanButterworth
Copy link
Collaborator

IanButterworth commented Jan 17, 2022

Great to hear.

#613 added a utility function for avoiding the issue you're seeing

Gtk.pause_eventloop() do
    @time loopthreaded!(arrthreaded)
end

It waits for a stopping eventloop to stop, if needed pauses the event loop, and reinstates its state afterwards

It would be best if the destroy callback waited until the eventloop was fully stopped, but that appeared to break closing windows on Windows sometimes. See #610

Also, just as an API point, i'd use Gtk.is_eventloop_running() rather than the internal Gtk.gtk_main_running[].
They're currently idempotent, but that may change.

Your example did make me realize there was a bug though, where pause_eventloop will restart the eventloop afterwards if it's currently stopping.

That is fixed in #615

Also I added Gtk.wait_eventloop_stopping() there to wait if it is stopping

(Grr.. Windows.. it would've been simpler if #610 wasn't happening.)

So with #615 your options are:

If you know the eventloop should be stopping i.e. after a window is destroyed

Gtk.wait_eventloop_stopping()
@time loopthreaded!(arrthreaded)

Or at any time, even if a window is open

Gtk.pause_eventloop() do
    @time loopthreaded!(arrthreaded)
end

which will pause rendering during execution.

Adding either of those to your example, I get

  0.001085 seconds (35 allocations: 2.844 KiB)
  0.005105 seconds

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants