Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add single exit node (TerminalNode) to FlowGraph #114

Merged
merged 11 commits into from
Jun 22, 2021

Conversation

zbanks
Copy link
Collaborator

@zbanks zbanks commented Jun 16, 2021

This PR is (mostly? entirely?) orthogonal to my other PR, so I don't think there's any nontrivial merge conflicts between the two and it doesn't matter which order they get merged in.

As discussed on Discord, any ReturnNode could be the exit of the FlowGraph, which complicated some of the interval/structural analysis. Adding a single common exit node helps reduce edge cases.

While making this change, I also made several other refactors. I tried to only change things when I could delete chunks of code (e.g. immediate_postdominator() in if_statements.py) or if the change helped preserve previous behavior/test output (e.g. the indenting changes).

I don't love how big this PR got -- but I think even though I technically added lines, the code is a bit more straightforward? I also don't think my changes are incompatible with the direction that #76 is headed.

Other refactors:

  • Postdominators & loops can now be calculated on the FlowGraph at construction. Previously it was "postdominator within this interval" because there was not the common exit. This moved a lot of code out of if_statements.py and into flow_graph.py.

  • Added Node.children() - it's very easy for a node to say what it's children are, rather than manually doing the .add_parent() bookkeeping. (The .add_parent() stuff could be removed in a future PR, and parents could be calculated in compute_relations, but the ordering change would introduce output changes, so I kept that out of this PR for clarity)

  • Node.emit_goto is now only used for very special cases (certain asm instructions, or the user manually annotates the source). Rather than pre-compute which nodes need goto, we just use a goto if the node was already emitted while building the Body.

  • The emit_node() function now contains the logic for checking if a node has been already been emitted and a goto should be used instead.

  • Added SwitchStatement to if_statements.py.

  • Indenting is now done at formatting time, instead of being determined while constructing Bodys. This makes it possible to shift Body segments around after construction without recalculating the indents.

  • Similiarly, empty return elision is done after building the Bodys, instead of during the build_* functions. Like indenting, this is to accomidate re-writes.

Notes on test diffs:

  • I added a "terminal andor" test case inspired by OOT's func_809E80D8. The 429dbfb commit has the output of the test on master, where you can see it failed to combine the conditions.
  • I'd consider the output of the switch test to be improved: it largely just moves the case blocks out of the ends of if statements.
  • The multi-switch test is a doozy. afaict it's semantically equivalent output, but it's still not great looking. (There is slightly less nesting though?).
  • loop-selfassign-phi has an infinite loop, so the flow graph is not reducible, and now mips2c automatically will fall back to the "no ifs"/"goto" mode. (There are only 12 functions across OOT/MM/Papermario that are non-reducible, and they also appear to be infinite loops.)
  • There were 10 tests that changed in total (counting g/o2 pairs separate) -- I think the rest are relatively inconsequential.

Previously, any ReturnNode could be the exit of the FlowGraph, which
complicated some of the interval/structural analysis. Adding a single
common exit node helps reduce edge cases.

Also refactored:

Postdominators & loops can now be calculated on the FlowGraph at
construction. Previously it was "postdominator within this interval"
because there was not the common exit. This moved a lot of code out of
`if_statements.py` and into `flow_graph.py`.

`Node.emit_goto` is now only used for very special cases (certain asm
instructions, or the user manually annotates the source). Rather than
pre-compute which nodes need goto, we just use a goto if the node was
already emitted while building the Body.

The `emit_node()` function now contains the logic for checking if a node
has been already been emitted and a goto should be used instead.

Added `SwitchStatement` to `if_statements.py`.

Indenting is now done at formatting time, instead of being determined
while constructing `Body`s. This makes it possible to shift `Body`
segments around after construction without recalculating the indents.

Similiarly, empty return elision is done after building the `Body`s,
instead of during the `build_*` functions. Like indenting, this is to
accomidate re-writes.
@ethteck
Copy link
Contributor

ethteck commented Jun 16, 2021

Not sure if this is appropriate, but I tested this out on a kind of nasty function in paper mario (func_802B62A4_E25174)

the results seem to be an improvement, but there are some new sections that seem not great:

            case 19:
                if ((s32) temp_s0_3 < 2) {
                    goto block_30;
                }
                goto block_31;
                goto block_32;
                goto block_32;
                goto block_32;
                goto block_32;
                goto block_32;
                goto block_32;
                goto block_32;
                goto block_32;
                goto block_32;
                goto block_32;
                goto block_32;

Copy link
Collaborator

@simonlindholm simonlindholm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Random comments. Can't say I have a good understanding of the big picture, still...

src/options.py Outdated Show resolved Hide resolved
tests/end_to_end/andor_return/irix-o2-out.c Show resolved Hide resolved
src/flow_graph.py Outdated Show resolved Hide resolved
src/flow_graph.py Outdated Show resolved Hide resolved
src/flow_graph.py Show resolved Hide resolved
src/if_statements.py Outdated Show resolved Hide resolved
src/if_statements.py Show resolved Hide resolved
src/if_statements.py Show resolved Hide resolved
src/if_statements.py Outdated Show resolved Hide resolved
src/if_statements.py Outdated Show resolved Hide resolved
- Add documentation/comments
- Remove stray debugging/comments
- Make `BaseNode` an ABC, `children()` abstract & return a list
- Make `indent` arg to `Formatter.indent()` optional
- Add more documentation, TODOs, and ext links
@zbanks
Copy link
Collaborator Author

zbanks commented Jun 21, 2021

@matt-kempster I'll wait to merge this until you have a chance to re-review it (if you want to)

Copy link
Owner

@matt-kempster matt-kempster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks absolutely great, thanks for working on this.

Comment on lines 1214 to 1218
for n in nodes:
if n == entry:
continue
nset = dominators(n)
for p in parents(n):
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit (just noticed this was in the original code): this would be difficult to debug in pdb due to the use of n and p. Maybe add a TODO to change these variable names to node and parent respectively?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nbd to just fix it now - I'l rename them to node & parent (and c -> child for consistency)

@@ -167,6 +234,25 @@ def get_lone_if_statement(self) -> Optional[IfElseStatement]:
ret = statement
return ret

def elide_empty_returns(self) -> None:
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we expand this to also elide duplicate returns and/or warn about unreachable post-return statements?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I think this is more appropriate for the emit_node discussion

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this refactor addressed those issues we were having before where we would sometimes output return; return; return; (ex func_80B8E84C in MM) -- that stopped happening even before I added elide_empty_returns.

if_body=if_body,
else_body=None,
)
)
emit_successor(node.fallthrough_edge, i)
else:
assert isinstance(node, TerminalNode)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed

- {context.flow_graph.terminal_node()}
)
for node in unemitted_nodes:
body.add_comment(f"bug: did not emit code for node #{node.name()}")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we should expand this to also include the contents of said block?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure - not too hard.

Comment on lines +286 to +287
# TODO: Treating ReturnNode as a special case and emitting it repeatedly
# hides the fact that we failed to fold the control flow. Maybe remove?
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly we should emit a comment at the very least indicating duplicate emission?

- Change locals from n/p/c to node/parent/child
- Emit comment about duplicated return
- Emit node content when skipped
@zbanks zbanks merged commit e3c0638 into matt-kempster:master Jun 22, 2021
@zbanks zbanks deleted the flowgraph-terminalnode branch June 22, 2021 00:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants