-
Notifications
You must be signed in to change notification settings - Fork 180
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Super lag in HLE mode, at least with r600/radeonsi Mesa drivers #1561
Comments
Can you post the full output of glxinfo? Maybe put it on pastebin or upload a txt file or something so it's not a huge post |
I've been doing more testing and apparently VBO just makes the problem even more evident, so this problem reside in other place. Just as note, the use of the GPU is almost nil in lag times and CPU usage does not exceed 33%. Any mesa-utils uses more GPU... (tested with radeontop). But if |
Do you have anything non-default set in the config? If so, can you post your mupen64plus.cfg? |
I modify Edit: |
I found the origin! 313741d In my tests, that comment causes much lag in: And partial lag (only in certain scenes) in: Further comments increase the lag in other games, but a625225 generalizes the lag in all games, "it was the straw that broke the camel's back". At least in my hardware, I deduce. |
It seems like your machine has an issue with gl_arb_buffer_storage Using the latest master, can you try forcing this variable to false: https:/gonetz/GLideN64/blob/master/src/Graphics/OpenGLContext/opengl_GLInfo.cpp#L58-L59 Just get rid of that whole statement and replace it with |
Nope, not is it. With that change the lag is slightly worse (between 0.1 to 1.2 FPS less in SM64). |
I think I found something interesting. See this first: https://imgur.com/a/rEMEu In some parts of Carrington Institute, the lag disappears completely (No. 1, No. 6 and No. 7). Usually, places with poor lighting or being very close to any wall, lit up or not. In the case of image No. 7 and No. 8, both are exactly the same spot. The only difference is that No. 8 was taken after activating Hi-Res. Save state: https://0x0.st/RpV.zip P.S: The gliden64.log only is generated when GLideN64 is compiled with Clang... |
I've been "playing a lot", updating and recompiling many dependencies in my PC... The first one is use cxd4-sse2 in "HLE mode" instead of the HLE plugin. Examples: https://imgur.com/a/HAqJZ Edit: |
I've been testing angrylion-plus... and I have noticed that in my previous post I confused HLE and LLE configuration in cxd4-sse2 completely (thanks to this). So the images description are inverted... cxd4-sse2 (HLE) in reality is cxd4-sse2 (LLE) and vice versa. My lag problem in HLE persists ... only now I know that cxd4 is also affected, Even angrylion-plus works for me faster than GLideN64 in HLE mode ( |
So the one that made things slow for you is this? Can you try disabling Copy color buffer to RDRAM and check if performance improves? It sounds like AMD VESA drivers don't like buffer storage either. Edit: While you are at it, can you test this branch? I'm curious how threaded GLideN64 performs with AMD hardware. |
I had him try disabling buffer storage (#1561 (comment)) it didn't seem to make a difference |
Let's double check by turning off "Copy color buffer to RDRAM". |
Disabling those two things is the only thing that helped me a bit. But I don't remember trying to disable that with the modification suggested by loganmc10 (#1561 (comment)), I'll try again.
Okay, I don't have much to do anyway. I'll try it when I'm at home. |
No. I don't see much difference between disabling @fzurita, the same history is with the |
So this is very odd. The commit that made things slow for you is 313741d but that code is only invoked when I'm not sure what is going on. Could it be that https:/gonetz/GLideN64/blob/master/ini/GLideN64.custom.ini is overwriting your setting? |
I've been testing in my free time (which is very little) and I did some discoveries. Unfortunately, nothing directly related to my problem... apparently.
I really doubt it. I don't use any GLideN64*.ini files. Or more precisely, I don't know where to put them to make them work with mupen64plus. And I can see the effects caused after edit my configuration file. In respect to 313741d see this first #1561 (comment) I know very little of programming (Turbo C, Turbo Pascal, Visual FoxPro, Delphi, etc.) and I'm more used to writing maintenance scripts (Batch, VBScript, AHK, etc.) for Windows.
https://www.libretro.com/index.php/introducing-vulkan-psx-renderer-for-beetlemednafen-psx/ In my last test I seen lag and freezing times in bc00985 when GLideN64 try to fill more of 542M in VRAM. After that the VRAM usage reduces a few MB and start to fill again. Edit: |
@fzurita, I get this using your 'further_reduce_shader_logic' branch: https://0x0.st/sX2R.log |
I have found the true origins (I believe). 313741d and 3aa365d As I mentioned before, 313741d cause lag (and glitches) to me on the boot time and the first scenes of Perfect Dark and others games #1561 (comment). But not in the gameplay. On the other hand, 3aa365d fixes the glitches caused by 313741d, but the lag become permanent in Perfect Dark, in HLE with Hi-Res enabled. With Hi-Res disabled, the FPS only become unstable. And I quote myself:
|
Do you have color buffer to RDRAM enabled? If you don't have it enabled, the first commit shouldn't make a difference. |
Well before those commits, EnableCopyColorToRDRAM was always disabled for GLES 2.0 devices. Edit: whoops, you have a AMD device. Too many issues and got things confused. It seems like buffer storage causes slow downs with AMD. At least with that specific driver. |
Neither patch appears to effect my ~1.0 GCN card. I mentioned earlier that I was interested in running this in a profiler. Caveat: I have no idea what I'm doing. Using the apitrace protocol, it looks like gl calls are using no more than ~2 ms in the gpu. In the CPU, however, the graph is skewed because some calls to glTexSubImage2D is occasionally take around 100 ms or more (!!). However, this happening enough for it to be the problem on a frame-by-frame basis. I am seeing calls to glDrawElementsBaseVertex taking ~15ms. When I zoom in the graph looks like this. The pattern here looks like several calls to glDrawElementsBaseVertex taking around 10-15ms, followed by a call to glFlushMappedBufferRange also taking about 10ms. Is there anything anyone would be interested in me looking at? |
@loganmc10, my tests without disabling buffer storage and VBOs, take them all with a grain of salt. Test with EnableCopyColorToRDRAM = 0 Personally I did not notice any significant changes, much less by disabling buffer storage and VBOs. @BPaden, try to run using Can you put the results of the following commands?
|
@BPaden that trace is very helpful. My next test will be replacing glDrawElementsBaseVertex with glDrawArrays. I assumed this might be an issue since @Jj0YzL5nvJ mentioned that LLE works better, I believe LLE always uses glDrawArrays (I could be remembering wrong though). Disabling VBO's is good for testing, but it can't be a long-term solution. Core OpenGL requires the use of VBO's (that's why you need the environment variable to get it to work). In a future version of Mesa, they could remove support altogether for non-VBO rendering if they wanted, so we can't count on that. @BPaden the long glTexSubImage2D is unfortunate but not unexpected. That is when the emulator is uploading texture data to the GPU. In a normal game you would do that at the beginning, not during rendering, but the emulator doesn't know about the texture data until right before it's needed, so we have to upload it like that |
Ok @BPaden @Jj0YzL5nvJ can you try this commit: I tested this on my Nvidia laptop and saw no difference in performance, but it may make a difference for you. I'm also curious if this makes any difference on Adreno devices with buffer storage @fzurita |
libGL: FPS = 60.9 😁 |
Sure, I'll try that. It will actually probably help performance with slower Android devices. I do know that VBOs and EBOs are slower with them. I remember in the past they were about 10% slower. |
Yeah well it definitely looks like we found the bottleneck in the Mesa driver. I'm going to hop on their IRC channel and ask about this. It's a little counterintuitive, the whole point of the elements (glDrawElements), is that you can reduce the amount of bandwidth used in uploading the vertex data. But maybe when used in conjunction with VBO streaming the benefits are negated, I'll be curious to hear if there is any difference on a mobile device. |
This time the improvement is huge: This time I don't consider necessary a test by disabling buffer storage and VBOs. (The true is that I'm short of time =P) |
Copy color to RDRAM is still slow even with that change it seems. |
Yeah I suspect that disabling buffer storage for the copies might fix that, we can test that in a bit. Sorry to do this to you, but @Jj0YzL5nvJ and @BPaden can you do one more test for me? I've been looking at Dolphin's code, and it looks like they don't use buffer storage or any buffers for the element arrays, I suspect that may be the actual problem. |
Also, can one or both of you add:
Here: https:/gonetz/GLideN64/blob/master/src/Graphics/OpenGLContext/opengl_GLInfo.cpp#L29 And post the output. We need to see how your card reports it's name so we can disable the buffer storage extension for Copy color to RDRAM |
Another update: someone from Mesa responded and indicated that it is our use of GL_UNSIGNED_BYTE that is the issue, please also test this: We are looking for whatever solution offers the best FPS really (although the use of glDrawArrays is probably out). So I'm curious between these 2 solutions: Which is faster |
@fzurita I wonder if that is the reason the Copy Color to RDRAM is slow as well, I believe glReadPixels is using GL_UNSIGNED_BYTE right? |
No problem. :)
this is about 5-10 FPS faster than master, but still slow. This is running at full speed for me. |
Oh also I have EnableCopyColorToRDRAM=2 for this test. |
@loganmc10 That could be the reason for your specific device. All the devices I have in my possession are pretty fast when using glReadPixels in a async way, except for Adreno 530, which I had to use floats for the pixel buffer. GLES 2.0 devices can't do a glReadPixels in a async way due to lack PBOs, but a lot can use the EGL image extension which is REALLY fast. Anyways, devices that have fast glReadPixels in my possession have PowerVR, Adreno, and Mali GPUs. What GPU that you have has slow glReadPixels? |
Sorry I meant why @Jj0YzL5nvJ might be experiencing slow glReadPixels, you've looked into the format stuff a lot so I thought you might have some insight on that. Maybe I'll try using the FLOAT type for the Radeon cards like the Adreno 530 has to see if it works faster. |
Any idea if Intel IGP's could benefit from any of these changes? |
According to the person on the Mesa issue tracker (https://bugs.freedesktop.org/show_bug.cgi?id=105256#c4):
But this is talking about the Mesa (Linux) drivers. This bug didn't affect the AMD Windows drivers. Basically there is no way to know what it might affect. The PR I submitted (#1735) isn't GPU specific, so this fix will be applied to all GPU's, since it doesn't harm anything. This was a pretty crippling bug on the AMD/Linux driver, so I suspect if Intel was affected, you would have noticed it already. |
Crippled GL_ARB_buffer_storage? ( Is any nouveau user out there?... |
Adding the Info: https://bugs.freedesktop.org/show_bug.cgi?id=102204#c7 |
You should make a pull request, but based on the wine link, it looks like we only want to use that flag on Mesa. |
From the moment of implementation a625225, this generates some kind of delay in radeon Mesa driver (r600g).
Xubuntu 16.04.3 LTS
glxinfo | grep OpenGL
The CPU consumption is not higher when compared to previous versions. And compiling with
-DCRC_OPT=On
changes almost nothing when it comes to FPS. Examples:Running SM64 until the moment the white star appears:
dddb3ae
cmake -DMUPENPLUSAPI=On ../../src/
LIBGL_SHOW_FPS=1 MESA_GL_VERSION_OVERRIDE=3.3COMPAT MESA_GLSL_VERSION_OVERRIDE=420
a625225
cmake -DMUPENPLUSAPI=On ../../src/
LIBGL_SHOW_FPS=1 MESA_GL_VERSION_OVERRIDE=3.3COMPAT MESA_GLSL_VERSION_OVERRIDE=420
bbc7131
cmake -DCRC_OPT=On -DMUPENPLUSAPI=On ../../src/
LIBGL_SHOW_FPS=1
If anyone knows how to put spoilers, let me know how.
The text was updated successfully, but these errors were encountered: