Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

demo: Initial hierarchical clustered simplification demo #760

Merged
merged 22 commits into from
Sep 6, 2024
Merged

Conversation

zeux
Copy link
Owner

@zeux zeux commented Sep 5, 2024

This PR adds a new demo file, nanite.cpp, that implements Nanite-style hierarchical simplification.
Working off the SIGGRAPH presentation by Brian Karis, it builds a DAG of clusters; for each cluster,
it tracks the bounds and error that can be used to do LOD selection in parallel / independently at
runtime.

The goal of this code is not to be a production quality pipeline (yet!); it is to serve as a testing
ground for future improvements in meshoptimizer's algorithms as well as development of new
algorithms. In the future this may become a good starting point, but for now the main purpose
is to be a test harness, as such there's a bunch of statistics / visualization support code here (and
more to come in the future at some point).

Also, to have a reasonable baseline, this code integrates with METIS to optionally use it to build
clusters, as well as to partition clusters. Without METIS we can still build clusters using meshopt,
but for partitioning we resolve to just sequentially merge clusters (which is obviously suboptimal).

METIS triangle partitioning is generally better in terms of maintaining contiguous clusters, but
worse in terms of generating a good cluster fill. There might be a way to coerce METIS to generate
better clusters with more tweaks; currently what often happens is that it decides to split a 256-size
partition into 127 and 129 triangles, and then has no choice but to split the 129-triangle partition
into two smaller ones. This may be fixable by repeatedly asking METIS to generate partitions with
different triangle counts but this becomes even more expensive... since METIS is only used here to
set a baseline, this is probably fine.

METIS cluster partitioning has surprising failure cases where it decides that a good partition should
consist of 4 clusters that barely connect; this was unexpected because it doesn't seem to happen during
triangle builds. Maybe with this in mind, a reasonable (unordered) greedy cluster merge could work
better, but this will need to be investigated separately in the future anyway.

The actual simplification currently uses automatic border detection; it would be better to switch to
using external vertex tagging via vertex_lock, but for now models I use for testing that have a lot of
room to improvement are not bottlenecked by this.

The actual DAG build is also very sensitive to all of the details above in ways that seem somewhat
difficult to control. If a group at any point earlier in the pipeline is stuck (eg can't be simplified too
much), it could be because the group has internally bad topology (faceting, UV seams), but could
also be because the group is formed of disjointed clusters, or the clusters themselves are formed
of disjointed triangles. This is bad because a) the group itself is kept at a high resolution, b) removing
this group from future simplification permanently locks the boundary edges for other clusters,
which exacerbates the problem.

To try to get around this, the DAG flow has very lax simplification limits (we keep anything where we
can remove 15% of triangles without breaking borders), and also keeps anything that gets stuck in
the queue and retries later - this in some cases makes things worse but often helps.

The error propagation is using a simpler variant of the full error described in Nanite paper; their
error looks at projected bounds whereas we just approximate it, but that doesn't matter as much.
What does matter for correctness is enforcing that the error is monotonic, which we do by ensuring
spherical group bounds are merged conservatively.

Finally, the METIS support code is written with zero regards towards efficiency; it uses bad containers
and inefficient algorithms in some places for expediency. This is because hopefully in a year or so
all of the METIS code can be removed because meshopt replacements will be superior. 🤞

The demo can be coerced to dump parts of the mesh as .obj via DUMP env var, which can be pretty
conveniently visualized in Blender:

image

Future work here includes more statistics around meshlet quality, maybe fixes around METIS usage
to improve the baseline a bit, and more importantly this will be used to improve the library in the future :)

This contribution is sponsored by Valve.

This is a stub that will be expanded to provide a demo of Nanite-like
(hierarchical clustered level of detail) processing. This is necessary
both to serve as an example of how to implement it using all best
practices as well as a testing harness; right now meshoptimizer has
enough functionality to implement *a* pipeline, but for best performance
currently some algorithms need to be swapped out. Long term the goal is
for this to be close to optimal while using just meshopt_ functions.
For initial Nanite implementation to work well we will need METIS for
graph partitioning. Eventually we will implement enough algorithms in
meshoptimizer itself to not need this.
This should be a more-or-less complete, if basic, merge-simplify-split
pipeline for cluster DAG build, with the exception of tracking the
actual DAG data (errors and parent links).

The caveat is that to merge clusters, we simply partition them
sequentially; this is suboptimal and where METIS can help. We also use
meshopt_buildMeshlets that can sometimes leave unwanted gaps but we
don't have a great way to quantify the issues yet without collecting
stats.
We can now compare things like total simplified triangle count assuming
every cluster goes all the way down to lowest LOD to judge the quality
of simplification, as well as number of stuck clusters at every level.

config=trace also now prints just the decisions about stuck clusters
which is the key factor wrt efficiency at the bottom level. It's likely
that we can adjust the process to be more graceful about this in certain
cases.
Nanite uses 128 so for now let's stick to this; max vertices will need
to be refined later, as well as the various thresholds associated with
merge/simplify.
This is required to be able to properly compute and update LOD
information; additionally, this reduces the amount of data copying we
have to do for indices and will make it easier to reintroduce meshlet
local indexing in the future.

As part of this, simplify() also returns indices instead of a cluster,
as it would not make sense to compute meshlet local indices here.
We now compute LOD bounds for each cluster as well as the parent
information that propagates through DAG and should make it possible to
select a DAG cut just based on bounds alone.

This also makes it possible to compute the lowest LOD (as well as any
other LOD!) from cluster data alone without relying on stuck_triangles
diagnostics which has been reworked to just analyze the current LOD
level.
To make it easier to understand the results, in addition to numerical
stats for the DAG we can now output a simplified mesh by computing the
error for each cluster from a viewpoint and using hierarchical
information to effectively select LODs at every level.

We always do this to output a number, but also can save the .obj to
stderr for subsequent external visualization.
For correct hierarchical LOD selection we need to maintain the bounds
monotonicity: any parent cluster needs to have error >= any child
cluster from any viewpoint. This is something that we don't currently
get right because our merged sphere might not cover all child spheres;
for now we will print an error if this happens, but this code might be
removed in the future.
This falls out of monotonicity requirement: it is not enough to make
LODBounds::error monotonic, the real requirement is that for any
viewpoint, boundsError is monotonic through the DAG. To achieve that we
need to make sure that the bounding sphere of any parent cluster
contains the bounding sphere of the child cluster, which may not hold if
the parent sphere is computed precisely based on vertex data.

Fixing this and fixing boundsError to return FLT_MAX if the viewpoint is
contained inside the sphere makes the DAG checks pass, so they can now
use assertions instead of logs.
We use a k-way partitioning scheme and ask for ~c/4 partitions in hopes
that it groups clusters reasonably well; for now we use the number of
shared edges as the connection weight although it is likely that the
number of shared vertices is a reasonable proxy that is easier to
compute and more useful.

Note that the rest of the pipeline was structured to deal with a more
strict partitioner, as such in some cases the results are better and in
some they are worse. This is good because the rest of the pipeline needs
to have better heuristics anyway.
Using environment variable DUMP, we can now control the contents of the
output .obj: -1 means "output DAG cut", 0-n means "output grouping at a
given LOD level".

When we output the cut, we also output individual clusters as separate
objects for ease of debugging.
We might need to split this into a separate option because, while this
seems to work, it also results in disjoint clusters and in addition to
this doesn't fill the clusters very well; at the minimum this will need
further tweaks.
This means that small clusters that end up being too small to merge or
edge-locked enough so that simplification is not effective may get
another chance further into the process to get merged with other
clusters. This is generally beneficial for quality although sometimes
results in worse results and definitely results in slower processing as
the stuck triangles keep being reevaluated on every pass.
Also use remapped vertices to identify adjacency for weighting; this
results in better spatial clustering for disconnected components.

Finally, for now triangle clustering requires METIS=2 because it has
complex tradeoffs and is not universally better from what it seems
like...
Especially Mhen Metis triangle clustering is used, we often get much
smaller clusters in the initial split, because it gets to 129 triangles
and splits that into 64+65. Then partioning may either take three
clusters with sizes a little under 128, or four clusters two of which
are 64/65, and the resulting cluster will be too small to merge.

Instead we remove the merge criteria outright if we have two clusters to
merge, and replace it with percentage reduction. Right now the
percentage is very lenient; it would not be ideal to have every single
level to just be a 85% reduction. However, this can be controlled on a
macro level and ideally we want to prevent the processing getting stuck
if possible.
Remap generation expects count % 3 == 0 right now, and we're just
feeding it our vertices as if they were a triangle buffer. Instead we
can use shadow index buffer for edge matching, which is a little simpler
anyway.
Since this is a separate branch now that happens before the merge, we
should visualize it separately; using a different label helps tell these
apart.
In general this code needs a lot more work but it is in a state where it
can be useful for testing and development so it's probably a reasonable
place to stop for now.
@JMS55
Copy link

JMS55 commented Sep 5, 2024

Btw, I highly recommend downloading a really large asset like https://quixel.com/megascans/free?category=3D%20asset&assetId=siEoZ (make sure to check the download source quality asset option) and testing on that.

@JMS55
Copy link

JMS55 commented Sep 5, 2024

METIS cluster partitioning has surprising failure cases where it decides that a good partition should
consist of 4 clusters that barely connect; this was unexpected because it doesn't seem to happen during
triangle builds. Maybe with this in mind, a reasonable (unordered) greedy cluster merge could work
better, but this will need to be investigated separately in the future anyway.

I've heard that adding weights based on cluster center distances can help.

The actual simplification currently uses automatic border detection; it would be better to switch to
using external vertex tagging via vertex_lock, but for now models I use for testing that have a lot of
room to improvement are not bottlenecked by this.

Why so? Don't you want to lock the vertices of the meshlet group border? Which is what SimplifyOptions::LockBorder should do afaik?

@zeux
Copy link
Owner Author

zeux commented Sep 5, 2024

Don't you want to lock the vertices of the meshlet group border? Which is what SimplifyOptions::LockBorder should do afaik?

This only matters when the source asset has geometric borders. If it does, then they will be locked just as meshlet borders are, when they could in theory be simplified instead.

For example, the asset you linked has geometric borders 😅 highlighted here:

image

As a result, using LockBorder in this case will lock that border in all clusters, and you will never get below a fairly large triangle count as a result.

This is easy to change; similarly to cluster boundary edge detection, every vertex that is shared between any two clusters can be tagged as locked before processing all clusters in the queue at a given level or a group. But since I'm testing on assets that don't have this, I'm cutting this particular corner for now.

@JMS55
Copy link

JMS55 commented Sep 5, 2024

Are you saying that meshopt's locked vertices are based on the vertex data, and not the index data I provide? E.g. using SimplifyOptions::Sparse | SimplifyOptions::LockBorder is locking the edges of the entire mesh whenever simplifying? If so, I should definitely switch to manual vertex locks... 😅

@zeux
Copy link
Owner Author

zeux commented Sep 5, 2024

It is based on index data. But the edge that is a border edge in the source geometry will always remain a border edge for any subset of the mesh.

The "full" counter is a little less helpful for METIS which very often
has almost-full clusters, so add an average as well to gauge the
distribution.

We also now count the number of singleton clusters; these are rare but
it's important that they stay that way for the quality of the partition.
Since whether meshlet is connected or not is a very important criteria
for how well it will do in a hierarchical clusterization process, we now
compute the number of connected components in meshlet demo and count the
number of meshlets with more than one.

Notably, for now we do this analysis using indices; this means we do
connectivity analysis on original, non-positional, topology, and on some
geometry that looks visually connected we will naturally have many
disconnected meshlets.
@zeux zeux merged commit 27299d0 into master Sep 6, 2024
12 checks passed
@zeux zeux deleted the nanite branch September 6, 2024 02:27
@zeux zeux changed the title demo: Initial hierarchical clustered simplification (Nanite) demo demo: Initial hierarchical clustered simplification demo Sep 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants