Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG/Request] OS X OpenGL compatibility issue #3364

Closed
Vestaia opened this issue Jul 24, 2016 · 15 comments
Closed

[BUG/Request] OS X OpenGL compatibility issue #3364

Vestaia opened this issue Jul 24, 2016 · 15 comments
Labels

Comments

@Vestaia
Copy link

Vestaia commented Jul 24, 2016

MPV 0.18.1 on OS X 10.11.5-10.11.6

OS X has a hard time handling OpenGL when window resolution matches display resolution. This is due to changes apple made in OS X 10.6. This results in absurdly high GPU usage when using fullscreen, although it stops if anything else is onscreen so MPV isn't taking up the whole screen. Making the window resolution 1 px smaller than display will also stop the high gpu usage (example: 1280x799 instead of 1280x800). This does not apply to all OpenGL apps, only certain ones like MPV, PPSSPP, and Spritekit apps. Because of this, Apple doesn't see this as "their problem" and instead put the responsibility on app developers to make sure this doesn't happen.

Reproduced on:

MBP 2015 late 13' OSX 10.11.6
MPV settings:
vo=opengl
Hwdec = no (same with videotoolbox)
Open a video; look at windowed gpu usage; go into full screen; look at gpu usage.
Monitored with Instruments xCode.

Expected behavior

Windowed but maximized should have similar GPU usage to fullscreen.

Actual behavior

Windowed maximized: GPU - 12%
Fullscreen: GPU - 83%

screen shot 2016-07-24 at 3 04 21 am

0:14 - 0:28: Fullscreen with nothing else onscreen. 0:28 - 0:51: Fullscreen with cursor at top of screen with menu bar on screen.
@ghost
Copy link

ghost commented Jul 24, 2016

Can you link the workarounds these other projects added?

@ghost ghost added the os:mac label Jul 24, 2016
@Vestaia
Copy link
Author

Vestaia commented Jul 24, 2016

I'm not sure of the exact method that developers use but here's some examples:
Enter the Gungeon has a fullscreen mode that doesn't have the gpu usage problem, although you can not tab out.
Plex home theater has something quite similar, again you can't tab out.
VLC's non-native fullscreen also doesn't have this problem, while retaining the ability to tab out.

@ghost
Copy link

ghost commented Jul 24, 2016

CC @pigoz and maybe @Akemi

@Vestaia Vestaia closed this as completed Jul 24, 2016
@Vestaia Vestaia reopened this Jul 24, 2016
@Vestaia
Copy link
Author

Vestaia commented Jul 24, 2016

Extra stats: (Note: the drop in the middle is when the menu bar on-screen).
screen shot 2016-07-24 at 1 06 20 pm

screen shot 2016-07-24 at 1 11 36 pm

screen shot 2016-07-24 at 1 08 38 pm

screen shot 2016-07-24 at 1 09 09 pm

@pigoz
Copy link
Member

pigoz commented Jul 24, 2016

This "bug" is quite strange. The sad truth is we have no idea how to work around it. I even tried to implement full screen using mission control but it doesn't seem to fix the problem for everyone (here's the commit if you feel like testing it 31a80aa)

Reducing the view/window size by 1px is not really acceptable.

@Akemi
Copy link
Member

Akemi commented Jul 24, 2016

if this is related to issue #2392 following patch/hack/workaround should help with the issue. this really is only for testing purpose, since it will cut 1px at the bottom and top if the video aspect ratio is smaller than display aspect ratio (example 4:3 video on 16:9 display). what could be done/tried though is nesting the view in another one and add some 'padding' to it to make the window at least 1 px bigger then the 'video view'. i didn't test if this works.

diff --git a/video/out/cocoa_common.m b/video/out/cocoa_common.m
index 557e28e..869c611 100644
--- a/video/out/cocoa_common.m
+++ b/video/out/cocoa_common.m
@@ -765,6 +765,11 @@ static int vo_cocoa_fullscreen(struct vo *vo)
         // window. Set that window delegate to the cocoa adapter to trigger
         // calls to -windowDidResignKey: and -windowDidBecomeKey:
         [[s->view window] setDelegate:s->adapter];
+        
+        NSRect frame = [s->video frameInPixels];
+        frame.size.height = frame.size.height + 2;
+        frame.origin.y = frame.origin.y - 1;
+        [[s->view window] setFrame: frame display: YES];
     }

     flag_events(vo, VO_EVENT_ICC_PROFILE_CHANGED);

also to sum up my last few comments on issue #2392 it seems that the fullscreen issue is (nearly completely) fixed with 10.12 (beta) + native-fs but is still broken on 10.11 + native-fs. in that regard we might just give up on fixing this issue on 10.11 (lost cause) and concentrate on 10.12 when it's released. i will test everything again when 10.12 comes out. also on a side note, since native-fs is the most promising, the current branch still needs some fixes before it could be merged.

@Vestaia Vestaia closed this as completed Jul 24, 2016
@Vestaia
Copy link
Author

Vestaia commented Jul 25, 2016

Just did some testing on MacOS Sierra 10.12 beta 3.
It appears the problem is still there. I can't know for sure until Apple updates xCode instruments to support 10.12 if it's the same problem but judging from temperature and current to the GPU, it appears to still be present.
MPV versions compiled on 7/25 01:00-03:00 EST

GPU power flow

Lockfocus branch + native-fs:
Fullscreen - 6.3W
Windowed (same resolution) - 0.8W
Fullscreen (with menu bar showing - same effect as patch) - 1.2W

Lockfocus branch:
Fullscreen - 7.1W
Windowed (same res) - 0.9W
Fullscreen (with menu bar) - 1.3W

Native-fs branch:
Fullscreen - 6.7W
Windowed (same res) - 0.9W
Fullscreen (with menu bar) - 1.2W

Master + native-fs:
Fullscreen - 7.0W
Windowed (same res) - 0.8W
Fullscreen (with menu bar) - 1.5W

Master:
Fullscreen - 5.8W
Windowed (same res) - 0.8W
Fullscreen (with menu bar) - 1.5W

Cause:

I believe the cause is still this: https://developer.apple.com/library/mac/documentation/GraphicsImaging/Conceptual/OpenGL-MacProgGuide/opengl_fullscreen/opengl_cgl.html
"OS X v10.6 and later automatically optimize the performance of screen-sized windows, allowing your application to take complete advantage of the window server environment on OS X."

"OS X automatically attempts to optimize this context’s performance. For example, when your application calls flushBuffer on the NSOpenGLContext object, the system may swap the buffers rather than copying the contents of the back buffer to the front buffer"
"Because the system may choose to swap the buffers rather than copy them, your application must completely redraw the scene after every call to flushBuffer."
This or a similar optimization may be causing issues with mpv.
I'll try to ask Apple about this but I highly doubt if they'll respond with any useful information.

Possible workaround/fix

"These performance optimizations are not applied when your application adds the NSOpenGLPFABackingStore attribute to the context."
Can anyone confirm if this fixes things?

@Vestaia Vestaia reopened this Jul 25, 2016
@Akemi
Copy link
Member

Akemi commented Jul 25, 2016

i am on vacation atm so i can't test it. though if you want you can test my cocoa_pixelformat branch. it adds kCGLPFABackingStore which should be the equivalent to NSOpenGLPFABackingStore.

@pigoz
Copy link
Member

pigoz commented Jul 25, 2016

"These performance optimizations are not applied when your application adds the NSOpenGLPFABackingStore attribute to the context."
Can anyone confirm if this fixes things?

Looks like it's true: https://developer.blender.org/T46993

EDIT: nope, they removed the flag, while we want to add it?

@Vestaia
Copy link
Author

Vestaia commented Jul 26, 2016

Progress (Temporary workaround):

I was able to get fullscreen GPU usage down to similar numbers to windowed by removing this attribute from context_cocoa.c of mpv-master on 10.12 Sierra beta 3:
kCGLPFADoubleBuffer

I will be doing more tests around this but so far, it appears to fix performance on both Native and MPV fullscreen.

CGLPixelFormatAttribute attrs[] = {
    kCGLPFAOpenGLProfile,
    (CGLPixelFormatAttribute) version,
    kCGLPFAAccelerated,
    #if MAC_OS_X_VERSION_MIN_REQUIRED >= MAC_OS_X_VERSION_10_8
    // leave this as the last entry of the array to not break the fallback
    // code
    kCGLPFASupportsAutomaticGraphicsSwitching,
    #endif
    0
};

This is the modified context_cocoa file (with kCGLPFAAccelerated also removed for testing):
context_cocoa.c.zip

Tests done so far on a 720p video:
Master (fullscreen - native and MPV):
Peak GPU power - 12.0W
Average GPU power - 6.3W
Total power usage average - 32.3W

Master (windowed):
Peak GPU power - 3.2W
Average GPU power - 1.4W
Total power usage average - 22.3W

Master (patched + fullscreen - native and MPV):
Peak GPU power - 2.9W
Average GPU power - 1.6W
Total power usage average - 23.1W

(This would be so much easier if xCode instruments worked on 10.12 and I can have actual numbers instead of measure power usage... Currently, xCode crashes with this: Crashing on exception: -[NSNib _initWithNibNamed:bundle:options:] could not load the nibName: XRStrategiesToolbarViewController in bundle (null).)

Bugs&fixes:

  • After patching, it appears the menu bar occasionally doesn't show or has a delay to show when mousing over the top of the screen. I'm not sure if this is a problem with the doublebuffer>singlebuffer patch or if it's just a problem with the beta OS. Again, really hard to test without xCode instruments.
  • This patch also appeared to fix the crash on exit bug that MPV has been having whenever I use my vapoursynth filter with custom shaders.
  • After patching, it appears launching Youtube streams on MPV is slightly slower, no tests has been done, just an observation.
  • This also fixes background GPU usage (when MPV is in native-fs but is not on an active display). Before patch, GPU usages maxed out at 100% if MPV is open in background.
  • This may breaks display sync. (I haven't noticed any tearing yet in high frame-rate videos).

Note: Tbh I feel like I have way to much time to spend testing this just to get fullscreen working when I have actual work I'm supposed to be doing lol. If the patch breaks anything else please let me know. I'm not certain of this but I think this should also work for OS X 10.11

EDIT:

  • Increased launch times for Youtube streams was caused by my ISP's throttling (Gotta love getting 5mb/s connection to youtube with 300mb/s bandwidth), seems like patch didn't affect stream launch times.
  • Occasional random bugs after patch: failure to minimize, failure to respond to click, ect... (rare).

@JCount
Copy link
Contributor

JCount commented Aug 26, 2016

I think I may have found something pertinent to this issue. When I examined an OpenGL trace of mpv master, executed with --no-config --vo=opengl on my mid 2015 15' Macbook Pro, every time the buffers were swapped, the CGLFlushDrawable() call was always preceded by a glFlush() call. e.g.:

33632:     2.07 µs glUseProgram(4);
33633:     0.42 µs glViewport(0, 0, 2880, 1800);
33634:     3.75 µs glUseProgram(0);
33635:     3.54 µs glBindFramebuffer(GL_FRAMEBUFFER, 0);
33636:    59.62 µs glFlush();
33637:   124.55 µs CGLFlushDrawable();

This could potentially be causing a large performance hit, especially with interpolation, because CGLFlushDrawable() performs an implicit glFlush() before it returns. Apple's documentation for the command specifically warns:

For optimal performance, an application should not call glFlush immediately before calling CGLFlushDrawable.

See https://developer.apple.com/library/mac/documentation/GraphicsImaging/Reference/CGL_OpenGL/index.html#//apple_ref/c/func/CGLFlushDrawable for more detail.

Using breakpoints, the call stacks for the two commands are:

glFlush() 
    Context: 0x7f89b3825000
    Virtual Screen:  0/3
    kCGLCPCurrentRendererID:  16925954 (0x01024502)
    GL_RENDERER:  Intel Iris Pro OpenGL Engine
    GL_VENDOR:  Intel Inc.
    GL_VERSION:  4.1 INTEL-10.14.73
    kCGLCPGPUFragmentProcessing:  GL_TRUE
    kCGLCPGPUVertexProcessing:  GL_TRUE
Function call stack: 
    0: 0x109801827 in gl_video_render_frame in <mpv> 
    1: 0x10981254a in draw_frame in <mpv> 
    2: 0x1098102bd in vo_thread in <mpv> 
    3: 0x7fff8fef399d in pthread_body in <libsystem_pthread.dylib> 
    4: 0x7fff8fef391a in pthread_body in <libsystem_pthread.dylib> 
    5: 0x7fff8fef1351 in thread_start in <libsystem_pthread.dylib>

and

CGLFlushDrawable() 
    Context: 0x7f89b3825000
    Virtual Screen:  0/3
    kCGLCPCurrentRendererID:  16925954 (0x01024502)
    GL_RENDERER:  Intel Iris Pro OpenGL Engine
    GL_VENDOR:  Intel Inc.
    GL_VERSION:  4.1 INTEL-10.14.73
    kCGLCPGPUFragmentProcessing:  GL_TRUE
    kCGLCPGPUVertexProcessing:  GL_TRUE
Function call stack: 
    0: 0x7fff9184effe in CGLFlushDrawable in <OpenGL> 
    1: 0x1097f4ecf in vo_cocoa_swap_buffers in <mpv> 
    2: 0x109812593 in flip_page in <mpv> 
    3: 0x10980f4d3 in vo_thread in <mpv> 
    4: 0x7fff8fef399d in pthread_body in <libsystem_pthread.dylib> 
    5: 0x7fff8fef391a in pthread_body in <libsystem_pthread.dylib> 
    6: 0x7fff8fef1351 in thread_start in <libsystem_pthread.dylib>

That fact that the removal of the kCGLPFADoubleBuffer attribute currently appears to improve performance seems to support this since:

If the receiver is not a double-buffered context, this call does nothing.

@ghost
Copy link

ghost commented Aug 26, 2016

So does removing the glFlush fix it?

The glFlush was once added because it actually improved rendering performance on some Linux GL drivers, but I suppose we can drop it or disable it by default if it really makes such a difference. That said, a glFlush shouldn't normally have a bad influence on performance if it happens only once per frame.

By the way, didn't someone claim that the newest Sierra betas fix this performance issue anyway?

@Hrxn
Copy link
Contributor

Hrxn commented Aug 26, 2016

By the way, didn't someone claim that the newest Sierra betas fix this performance issue anyway?

Yes, I've read that here as well, a while ago. Not sure where this comment is now, though.

@Akemi
Copy link
Member

Akemi commented Aug 26, 2016

the comment was here #2392 (comment). i briefly tested it on Beta 5 too and some of the issues disappeared though i will only thoroughly test it again when it's out of beta.

@JCount
Copy link
Contributor

JCount commented Aug 26, 2016

Doing this:

diff --git a/video/out/opengl/video.c b/video/out/opengl/video.c
index 901e208..e9ed8ff 100644
--- a/video/out/opengl/video.c
+++ b/video/out/opengl/video.c
@@ -2781,11 +2781,6 @@ void gl_video_render_frame(struct gl_video *p, struct vo_frame *frame, int fbo)

     gl->BindFramebuffer(GL_FRAMEBUFFER, 0);

-    // The playloop calls this last before waiting some time until it decides
-    // to call flip_page(). Tell OpenGL to start execution of the GPU commands
-    // while we sleep (this happens asynchronously).
-    gl->Flush();
-
     p->frames_rendered++;

     // Report performance metrics

seems to help with some of the issues that seem more directly tied to some of the weird behavior that the OSD sometimes exhibits with the cocoa backend. A good example would be the stalls and user perceived lag that can occur with a resume from a paused state or with a resize of the window of some sort. Unfortunately, it doesn't seem to really do much to improve the actual performance of the opengl vo. However, that element hopefully will be much improved with macOS Sierra, as @Akemi has seen so far. Still, it would be interesting if someone with the beta installed tried something like this, just to see if it perhaps even improves things further by working more 'correctly' with Apple's os specific opengl implementation. (and therefor the Cocoa frameworks)

@wm4 From the way I understand how CGLFlushDrawable works, calling a glFlush right before it would essentially be quite similar to calling glFlush itself twice on the frame. I think CGLFlushDrawable may have some mechanisms to try and mitigate that cost a bit, though. However, one cannot look at the source to see what the case really is unfortunately. That said you understand all of this much better than I do, so if you think I am mistaken or chasing a red herring, pay me no heed.

EDIT:
On closer inspection this actually seems to be breaking something. The CGLFlushDrawable call superficially seems to be swapping the buffers correctly, but now the OpenGL queue spends many orders of magnitude longer executing glTexSubImage2D in full screen. The end result is a reducuction in performance that seems to scale rapidly when more demanding shaders are used. My initial test case of --no-config --vo=opengl:interpolation --video-sync=display-resample glossed this over quite well. (This could of course mean that the buffers are in fact not being properly handled if the glFlush call is removed.)

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants