Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More efficient dirtying of the UV scale uniform #17479

Closed
wants to merge 3 commits into from

Conversation

hrydgard
Copy link
Owner

@hrydgard hrydgard commented May 17, 2023

This allows more draw calls to be merged in some games like GTA that wildly mix different but output-compatible vertex formats for related geometry. The result is about a 10% reduction in drawcalls in GTA games, and some reduction in uniform value updates. Many other games are barely affected at all by this.

This one is backend-neutral.

@hrydgard hrydgard added the GE emulation Backend-independent GPU issues label May 17, 2023
@hrydgard hrydgard added this to the v1.16.0 milestone May 17, 2023
Copy link
Collaborator

@unknownbrackets unknownbrackets left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this seems good. A few comments, I think we could clean up a bit and also ignore morphcount the same way.

-[Unknown]

GPU/Common/DrawEngineCommon.cpp Show resolved Hide resolved
@@ -822,7 +822,7 @@ void GPUCommonHW::FastRunLoop(DisplayList &list) {
void GPUCommonHW::Execute_VertexType(u32 op, u32 diff) {
if (diff)
gstate_c.Dirty(DIRTY_VERTEXSHADER_STATE);
if (diff & (GE_VTYPE_TC_MASK | GE_VTYPE_THROUGH_MASK)) {
if (diff & (GE_VTYPE_THROUGH_MASK)) {
gstate_c.Dirty(DIRTY_UVSCALEOFFSET);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can remove this, and only have the inner if.

-[Unknown]

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, we should also make this function smarter and avoid flushing in many cases, just like Execute_VertexTypeSkinning. We still can use these improvements even if software skinning is off.

gstate_c.deferredVertTypeDirty = 0;
// If the vertex format changed, but the component setup didn't (same components, different input format),
// we don't actually need to flush. Assuming the output vertex format stays the same. This is quite common in games with highly optimized mesh data.
if ((diff & ~(GE_VTYPE_TC_MASK | GE_VTYPE_COL_MASK | GE_VTYPE_POS_MASK)) == 0 &&
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't we even ignore weightcount here? I feel like this logic would be easier to read in a static func, i.e.:

static bool IsVTypeCompatible(u32 prev, u32 diff) {
	// Did anything outside the component types and weightcount change?
	if ((diff & ~(GE_VTYPE_WEIGHTCOUNT_MASK | GE_VTYPE_TC_MASK | GE_VTYPE_COL_MASK | GE_VTYPE_POS_MASK)) != 0)
		return false;

	u32 cur = prev ^ diff;
	if (((prev & GE_VTYPE_TC_MASK) != 0) != ((cur & GE_VTYPE_TC_MASK) != 0))
		return false;
	if (((prev & GE_VTYPE_COL_MASK) != 0) != ((cur & GE_VTYPE_COL_MASK) != 0))
		return false;

	return true;
}

Generally, I hate ifs that span multiple lines. They're never clean, readable code.

Also, can't we actually ignore morphcount too? It only affects vertex decode. Might help LBP.

-[Unknown]

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah we can definitely ignore morphcount here too, since changing it doesn't change the output format. Just didn't think of it. And breaking the function out like that makes sense too.

Comment on lines +851 to +856
// In this case, we may be doing weights and morphs.
// Update any bone matrix uniforms so it uses them correctly.
if ((op & GE_VTYPE_MORPHCOUNT_MASK) != 0) {
gstate_c.Dirty(gstate_c.deferredVertTypeDirty);
gstate_c.deferredVertTypeDirty = 0;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should move this down outside the if, so you don't need to duplicate it.

-[Unknown]

}
if (diff & GE_VTYPE_THROUGH_MASK)
gstate_c.Dirty(DIRTY_RASTER_STATE | DIRTY_VIEWPORTSCISSOR_STATE | DIRTY_FRAGMENTSHADER_STATE | DIRTY_GEOMETRYSHADER_STATE | DIRTY_CULLRANGE);
gstate_c.Dirty(DIRTY_RASTER_STATE | DIRTY_VIEWPORTSCISSOR_STATE | DIRTY_FRAGMENTSHADER_STATE | DIRTY_GEOMETRYSHADER_STATE | DIRTY_CULLRANGE | DIRTY_UVSCALEOFFSET);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does through actually care about uvscaleoffset since we're doing the framebuffer thing now?

-[Unknown]

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess not, indeed. Removing.

@hrydgard
Copy link
Owner Author

hrydgard commented May 25, 2023

I think I'm gonna split this into two changes, first we fix the excessive uniform updates that are fixed by the new framebuffer flag, and only then do we add merging of drawcalls with compatible output formats. Less room for mistakes and easier review and bisection.

@hrydgard hrydgard closed this May 25, 2023
hrydgard added a commit that referenced this pull request May 25, 2023
Broken out from #17479

With OpenGL, greatly reduces the amount of glUniform4fv calls in many games (and
similar in the other backends).
hrydgard added a commit that referenced this pull request May 25, 2023
@unknownbrackets unknownbrackets removed this from the v1.16.0 milestone May 26, 2023
hrydgard added a commit that referenced this pull request Aug 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
GE emulation Backend-independent GPU issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants