demo: Initial hierarchical clustered simplification demo #760

zeux · 2024-09-05T22:14:50Z

This PR adds a new demo file, nanite.cpp, that implements Nanite-style hierarchical simplification.
Working off the SIGGRAPH presentation by Brian Karis, it builds a DAG of clusters; for each cluster,
it tracks the bounds and error that can be used to do LOD selection in parallel / independently at
runtime.

The goal of this code is not to be a production quality pipeline (yet!); it is to serve as a testing
ground for future improvements in meshoptimizer's algorithms as well as development of new
algorithms. In the future this may become a good starting point, but for now the main purpose
is to be a test harness, as such there's a bunch of statistics / visualization support code here (and
more to come in the future at some point).

Also, to have a reasonable baseline, this code integrates with METIS to optionally use it to build
clusters, as well as to partition clusters. Without METIS we can still build clusters using meshopt,
but for partitioning we resolve to just sequentially merge clusters (which is obviously suboptimal).

METIS triangle partitioning is generally better in terms of maintaining contiguous clusters, but
worse in terms of generating a good cluster fill. There might be a way to coerce METIS to generate
better clusters with more tweaks; currently what often happens is that it decides to split a 256-size
partition into 127 and 129 triangles, and then has no choice but to split the 129-triangle partition
into two smaller ones. This may be fixable by repeatedly asking METIS to generate partitions with
different triangle counts but this becomes even more expensive... since METIS is only used here to
set a baseline, this is probably fine.

METIS cluster partitioning has surprising failure cases where it decides that a good partition should
consist of 4 clusters that barely connect; this was unexpected because it doesn't seem to happen during
triangle builds. Maybe with this in mind, a reasonable (unordered) greedy cluster merge could work
better, but this will need to be investigated separately in the future anyway.

The actual simplification currently uses automatic border detection; it would be better to switch to
using external vertex tagging via vertex_lock, but for now models I use for testing that have a lot of
room to improvement are not bottlenecked by this.

The actual DAG build is also very sensitive to all of the details above in ways that seem somewhat
difficult to control. If a group at any point earlier in the pipeline is stuck (eg can't be simplified too
much), it could be because the group has internally bad topology (faceting, UV seams), but could
also be because the group is formed of disjointed clusters, or the clusters themselves are formed
of disjointed triangles. This is bad because a) the group itself is kept at a high resolution, b) removing
this group from future simplification permanently locks the boundary edges for other clusters,
which exacerbates the problem.

To try to get around this, the DAG flow has very lax simplification limits (we keep anything where we
can remove 15% of triangles without breaking borders), and also keeps anything that gets stuck in
the queue and retries later - this in some cases makes things worse but often helps.

The error propagation is using a simpler variant of the full error described in Nanite paper; their
error looks at projected bounds whereas we just approximate it, but that doesn't matter as much.
What does matter for correctness is enforcing that the error is monotonic, which we do by ensuring
spherical group bounds are merged conservatively.

Finally, the METIS support code is written with zero regards towards efficiency; it uses bad containers
and inefficient algorithms in some places for expediency. This is because hopefully in a year or so
all of the METIS code can be removed because meshopt replacements will be superior. 🤞

The demo can be coerced to dump parts of the mesh as .obj via DUMP env var, which can be pretty
conveniently visualized in Blender:

Future work here includes more statistics around meshlet quality, maybe fixes around METIS usage
to improve the baseline a bit, and more importantly this will be used to improve the library in the future :)

This contribution is sponsored by Valve.

This is a stub that will be expanded to provide a demo of Nanite-like (hierarchical clustered level of detail) processing. This is necessary both to serve as an example of how to implement it using all best practices as well as a testing harness; right now meshoptimizer has enough functionality to implement *a* pipeline, but for best performance currently some algorithms need to be swapped out. Long term the goal is for this to be close to optimal while using just meshopt_ functions.

For initial Nanite implementation to work well we will need METIS for graph partitioning. Eventually we will implement enough algorithms in meshoptimizer itself to not need this.

This should be a more-or-less complete, if basic, merge-simplify-split pipeline for cluster DAG build, with the exception of tracking the actual DAG data (errors and parent links). The caveat is that to merge clusters, we simply partition them sequentially; this is suboptimal and where METIS can help. We also use meshopt_buildMeshlets that can sometimes leave unwanted gaps but we don't have a great way to quantify the issues yet without collecting stats.

We can now compare things like total simplified triangle count assuming every cluster goes all the way down to lowest LOD to judge the quality of simplification, as well as number of stuck clusters at every level. config=trace also now prints just the decisions about stuck clusters which is the key factor wrt efficiency at the bottom level. It's likely that we can adjust the process to be more graceful about this in certain cases.

Nanite uses 128 so for now let's stick to this; max vertices will need to be refined later, as well as the various thresholds associated with merge/simplify.

This is required to be able to properly compute and update LOD information; additionally, this reduces the amount of data copying we have to do for indices and will make it easier to reintroduce meshlet local indexing in the future. As part of this, simplify() also returns indices instead of a cluster, as it would not make sense to compute meshlet local indices here.

We now compute LOD bounds for each cluster as well as the parent information that propagates through DAG and should make it possible to select a DAG cut just based on bounds alone. This also makes it possible to compute the lowest LOD (as well as any other LOD!) from cluster data alone without relying on stuck_triangles diagnostics which has been reworked to just analyze the current LOD level.

To make it easier to understand the results, in addition to numerical stats for the DAG we can now output a simplified mesh by computing the error for each cluster from a viewpoint and using hierarchical information to effectively select LODs at every level. We always do this to output a number, but also can save the .obj to stderr for subsequent external visualization.

For correct hierarchical LOD selection we need to maintain the bounds monotonicity: any parent cluster needs to have error >= any child cluster from any viewpoint. This is something that we don't currently get right because our merged sphere might not cover all child spheres; for now we will print an error if this happens, but this code might be removed in the future.

This falls out of monotonicity requirement: it is not enough to make LODBounds::error monotonic, the real requirement is that for any viewpoint, boundsError is monotonic through the DAG. To achieve that we need to make sure that the bounding sphere of any parent cluster contains the bounding sphere of the child cluster, which may not hold if the parent sphere is computed precisely based on vertex data. Fixing this and fixing boundsError to return FLT_MAX if the viewpoint is contained inside the sphere makes the DAG checks pass, so they can now use assertions instead of logs.

We use a k-way partitioning scheme and ask for ~c/4 partitions in hopes that it groups clusters reasonably well; for now we use the number of shared edges as the connection weight although it is likely that the number of shared vertices is a reasonable proxy that is easier to compute and more useful. Note that the rest of the pipeline was structured to deal with a more strict partitioner, as such in some cases the results are better and in some they are worse. This is good because the rest of the pipeline needs to have better heuristics anyway.

Using environment variable DUMP, we can now control the contents of the output .obj: -1 means "output DAG cut", 0-n means "output grouping at a given LOD level". When we output the cut, we also output individual clusters as separate objects for ease of debugging.

We might need to split this into a separate option because, while this seems to work, it also results in disjoint clusters and in addition to this doesn't fill the clusters very well; at the minimum this will need further tweaks.

This means that small clusters that end up being too small to merge or edge-locked enough so that simplification is not effective may get another chance further into the process to get merged with other clusters. This is generally beneficial for quality although sometimes results in worse results and definitely results in slower processing as the stuck triangles keep being reevaluated on every pass.

Also use remapped vertices to identify adjacency for weighting; this results in better spatial clustering for disconnected components. Finally, for now triangle clustering requires METIS=2 because it has complex tradeoffs and is not universally better from what it seems like...

Especially Mhen Metis triangle clustering is used, we often get much smaller clusters in the initial split, because it gets to 129 triangles and splits that into 64+65. Then partioning may either take three clusters with sizes a little under 128, or four clusters two of which are 64/65, and the resulting cluster will be too small to merge. Instead we remove the merge criteria outright if we have two clusters to merge, and replace it with percentage reduction. Right now the percentage is very lenient; it would not be ideal to have every single level to just be a 85% reduction. However, this can be controlled on a macro level and ideally we want to prevent the processing getting stuck if possible.

Remap generation expects count % 3 == 0 right now, and we're just feeding it our vertices as if they were a triangle buffer. Instead we can use shadow index buffer for edge matching, which is a little simpler anyway.

Since this is a separate branch now that happens before the merge, we should visualize it separately; using a different label helps tell these apart.

In general this code needs a lot more work but it is in a state where it can be useful for testing and development so it's probably a reasonable place to stop for now.

JMS55 · 2024-09-05T22:30:01Z

Btw, I highly recommend downloading a really large asset like https://quixel.com/megascans/free?category=3D%20asset&assetId=siEoZ (make sure to check the download source quality asset option) and testing on that.

JMS55 · 2024-09-05T22:35:48Z

METIS cluster partitioning has surprising failure cases where it decides that a good partition should
consist of 4 clusters that barely connect; this was unexpected because it doesn't seem to happen during
triangle builds. Maybe with this in mind, a reasonable (unordered) greedy cluster merge could work
better, but this will need to be investigated separately in the future anyway.

I've heard that adding weights based on cluster center distances can help.

The actual simplification currently uses automatic border detection; it would be better to switch to
using external vertex tagging via vertex_lock, but for now models I use for testing that have a lot of
room to improvement are not bottlenecked by this.

Why so? Don't you want to lock the vertices of the meshlet group border? Which is what SimplifyOptions::LockBorder should do afaik?

zeux · 2024-09-05T22:40:57Z

Don't you want to lock the vertices of the meshlet group border? Which is what SimplifyOptions::LockBorder should do afaik?

This only matters when the source asset has geometric borders. If it does, then they will be locked just as meshlet borders are, when they could in theory be simplified instead.

For example, the asset you linked has geometric borders 😅 highlighted here:

As a result, using LockBorder in this case will lock that border in all clusters, and you will never get below a fairly large triangle count as a result.

This is easy to change; similarly to cluster boundary edge detection, every vertex that is shared between any two clusters can be tagged as locked before processing all clusters in the queue at a given level or a group. But since I'm testing on assets that don't have this, I'm cutting this particular corner for now.

JMS55 · 2024-09-05T22:51:33Z

Are you saying that meshopt's locked vertices are based on the vertex data, and not the index data I provide? E.g. using SimplifyOptions::Sparse | SimplifyOptions::LockBorder is locking the edges of the entire mesh whenever simplifying? If so, I should definitely switch to manual vertex locks... 😅

zeux · 2024-09-05T22:52:12Z

It is based on index data. But the edge that is a border edge in the source geometry will always remain a border edge for any subset of the mesh.

The "full" counter is a little less helpful for METIS which very often has almost-full clusters, so add an average as well to gauge the distribution. We also now count the number of singleton clusters; these are rare but it's important that they stay that way for the quality of the partition.

Since whether meshlet is connected or not is a very important criteria for how well it will do in a hierarchical clusterization process, we now compute the number of connected components in meshlet demo and count the number of meshlets with more than one. Notably, for now we do this analysis using indices; this means we do connectivity analysis on original, non-positional, topology, and on some geometry that looks visually connected we will naturally have many disconnected meshlets.

zeux added 20 commits September 3, 2024 11:51

demo: Add METIS as an optional dependency

1f34f1d

For initial Nanite implementation to work well we will need METIS for graph partitioning. Eventually we will implement enough algorithms in meshoptimizer itself to not need this.

demo: Extract cluster size into a tunable constant

0d2053d

Nanite uses 128 so for now let's stick to this; max vertices will need to be refined later, as well as the various thresholds associated with merge/simplify.

demo: Implement an initial version of Metis clusterizer

94baf79

We might need to split this into a separate option because, while this seems to work, it also results in disjoint clusters and in addition to this doesn't fill the clusters very well; at the minimum this will need further tweaks.

demo: Use meshopt_generateShadowIndexBuffer instead of remap

78505ff

Remap generation expects count % 3 == 0 right now, and we're just feeding it our vertices as if they were a triangle buffer. Instead we can use shadow index buffer for edge matching, which is a little simpler anyway.

demo: Fix visualization of single clusters

e887153

Since this is a separate branch now that happens before the merge, we should visualize it separately; using a different label helps tell these apart.

demo: Add introduction comments to nanite.cpp

9ed2fc3

In general this code needs a lot more work but it is in a state where it can be useful for testing and development so it's probably a reasonable place to stop for now.

demo: Fix MSVC build to work around getenv deprecation

b0fd7c7

zeux added 2 commits September 5, 2024 19:22

zeux force-pushed the nanite branch from 0d2b986 to bd12308 Compare September 6, 2024 02:23

zeux merged commit 27299d0 into master Sep 6, 2024
12 checks passed

zeux deleted the nanite branch September 6, 2024 02:27

zeux changed the title ~~demo: Initial hierarchical clustered simplification (Nanite) demo~~ demo: Initial hierarchical clustered simplification demo Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

demo: Initial hierarchical clustered simplification demo #760

demo: Initial hierarchical clustered simplification demo #760

zeux commented Sep 5, 2024

JMS55 commented Sep 5, 2024

JMS55 commented Sep 5, 2024

zeux commented Sep 5, 2024

JMS55 commented Sep 5, 2024

zeux commented Sep 5, 2024

demo: Initial hierarchical clustered simplification demo #760

demo: Initial hierarchical clustered simplification demo #760

Conversation

zeux commented Sep 5, 2024

JMS55 commented Sep 5, 2024

JMS55 commented Sep 5, 2024

zeux commented Sep 5, 2024

JMS55 commented Sep 5, 2024

zeux commented Sep 5, 2024