Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adopt Microsoft/tolerant-php-parser #357

Merged
merged 151 commits into from
Jun 9, 2017

Conversation

mousetraps
Copy link
Contributor

@mousetraps mousetraps commented Apr 21, 2017

fix #323

As discussed in microsoft/tolerant-php-parser#36 (comment), key benefits include:

  • graceful error handling
  • performance + memory usage
  • fully-representative tree

This is a pretty big PR, so in order to make it easier to review - a little explanation of the current status and deep dive into the changes is in order. Additionally, this braindump was a good way to wrap my head around the remaining issues 😃. Happy to answer any additional Qs, and stay tuned for further updates!

What's ready for review?

  • The files that are ready for code review right now are TolerantDefinitionResolver.php and TolerantSymbolInformation.php. There are still a few TODOS, Formatting and doc comments need some clean up, but otherwise pretty reasonable.
  • Additionally, it would be helpful to get feedback on general approach. As described below, the code itself still needs a lot of cleanup, and can be misleading. I plan to work through these remaining issues in the coming days.

I'll walk through this in two stages:

  1. Some general status
  2. Some context around the code changes themselves (including TODOs on each file, which I'll make my way through tomorrow). Note that I'm still working on cleaning this up, so sorry that it's a currently quite unpolished and/or potentially misleading.

General Status

Working branch:

Currently implemented Features

  • error diagnostics
  • find all symbols
  • find all symbols in workspace
  • [not impacted] format code
  • go to definition
  • find all references
  • hover
  • completion

Testing and Validation

In order to validate the changes, I thoroughly compared the language server results (fqns, definitions, references) of the two parsers on a number of frameworks (~15,000 files in total), and worked to resolve any discrepencies.

  • CakePHP
  • Drupal
  • math-php
  • php-language-server
  • tolerant-php-parser
  • symfony
  • Wordpress

Notable Improvements

  • Errors are now gracefully handled
  • Significantly improved performance
  • fully representatative trees, which means it'll be suitable for formatting / refactoring operations
  • proper of Qualified Names in more places (e.g. catch expression types now resolve propery)
  • function return types are properly resolved to a type
  • use Trait references are now properly resolved
  • group function use declarations are now properly resolved to a type
  • improved namespace reference part references

Notable Regressions

  • Error diagnostics produced by the parser are not currently "strict" because we parse things more loosely in order to provide error tolerance, so we won't get diagnostics for everything until we add relevant post-parse checks (this will happen on the parser side).
  • The new parser does not have pretty-printing yet, so the definition lines produced are exactly what the user has typed in (including whitespace), rather than a pretty-printed version.
  • Magic constants are treated like normal constants and will therefore appear in "find all references" results. They will also not be included in type resolution. Their fqsen will be resolved to their namespace value if one is defined (even though this is not actually legal).
  • variables used within template strings are not currently included in find all references results [requires parser fix]
  • yield-expressions support (currently produces squigglies on "yield", but rest of tree is OK) [requires parser fix]

Other differences

  • The new parser has higher granularity when it comes to Node position info, which results in slightly different (but still correct) position locations in some cases (for instance for a function call hello(/*cursor*/), go-to-definition at the denoted cursor position will not yield a result (but will yield a result if the function call name is selected)

Performance Comparison

At this point in time, the performance is not nearly as good as it could be (e.g. every call to getResolvedName does a tree walk rather than performing any caching, does not include the impacts of incremental parsing, some of the API or API calls are innefficient and need to be improved).

That said, even without these changes - some preliminary results of the impact from the most recent run - this includes all lang. server operations, not just are below (see Performance.php for details - only ran this on my machine - PHP 7.1 x64 on Windows, kept an eye on CPU to ensure it remained constant throughout, units are in seconds) - still need to run more benchmarks and analayze results (unfortunately XDebug profiler has been seriously failing me for this use case):

Framework old parser new parser % difference notes
drupal 35.1 25.7 -27%
wordpress 3.2 2.1 -34%
php-language-server 12.6 9.6 24%
tolerant-php-parser 3.2 5.0 +56% results incomplete - old parser produced null ASTs for 146 files and significantly fewer definitions overall
math-php 1.5 0.9 -40%
symfony 17.3 12.1 -30%
CodeIgniter 1.5 1.0 -33%
cakephp 4.5 3.4 -24%

The memory improvements are not nearly as significant as they could be because at this point in time, only the ASTs for the open files are held in memory, and definitions/references representation is the primary concern. However as demonstrated earlier - the ASTs themselves are significantly lighter, so in the future, we could hold many more in memory rather than throwing them all away.

Next steps

  • Finish cleaning up existing code
  • Merge updates from master
  • Implement completions support
  • Merge parser changes into master (see microsoft/tolerant-php-parser@master...mousetraps:lang-server)
  • Complete remaining parser changes
  • Update extension and distribute early version for people to play with
  • Fuzz-testing

File Changes

Changes described below, with corresponding TODOs

GENERAL TODO

  • add declare(strict_types = 1); to all files
  • clean up doc comments, add comments for new method definitions
  • clean up formatting
  • revert all files not mentioned below

composer.json [modified]

Modified to add a reference Microsoft/tolerant-php-parser. Note that the referenced version is currently being pulled from a private branch until these changes make their way to master

TODO

  • modify to pull from lang-server rather than dev-lang-server branch
  • update to official branch when available, consider distributing through package manager instead.

LanguageServer.php [modified]

The language server entrypoint is at LanguageServer::initialize(). The initialize method is responsible for setting up the indexes, server capabilities, etc.

The changes here are pretty minimal. The first change is that the $definitionResolver property is now of type DefinitionResolverInterface, rather than DefinitionResolver.

This enables us to hotswap parsers - DefinitionResolver (the definition resolver corresponding to the old parser) and TolerantDefinitionResolver (the definition resolver corresponding to the new parser) both implement DefinitionResolverInterface

DefinitionResolverInterface.php [added], DefinitionResolver.php [modified], TolerantDefinitionResolver.php [modified]

In order to make it as easy as possible to transition from the old parser to the new parser and validate these changes, they both retain the same core interface as before:

  • getDeclarationLineFromNode($node)
  • getDocumentationFromNode($node)
  • createDefinitionFromNode($node, string $fqn = null) : Definition
  • resolverReferenceNodeToDefinition($node)
  • resolveExpressionNodeToType($expr) : Type
  • getTypeFromNode($node)
  • getDefinedFqn($node)

The signatures no longer include type hints for PhpParser\Node - this is because the new Node objects of are type Microsoft\PhpParser\Node (note that Microsoft\PhpParser is aliased as Tolerant everywhere)

The TolerantDefinitionResolver logic has been completely rewritten otherwise to support the new AST structure and node / token properties.

Initially, I considered creating some more granular abstractions in the hopes of creating a shared interface between the two parsers, but the differences were too significant to be worth the effort (e.g. method calls are significantly more expensive than property accesses, code became more difficult to debug, etc.)

TODO

  • finish reviewing type resolution logic for expressions
  • revert functional changes to DefinitionResolver

ParserResourceFactory.php [added], ParserKind.php [added]

The ParserResourceFactory class enables us to manage the resources between the two parser kinds (defined in ParserKind). This includes instantiating the parser, definition resolver, and tree analyzers.

The parser kind is set with a static property, and there are two primary parser kinds

  • PHP_PARSER
  • TOLERANT_PHP_PARSER

The current 'default' is TOLERANT_PHP PARSER.

In addition, there are diagnostic modes which can be accessed by specifying DIAGNOSTIC_PHP_PARSER, or DIAGNOSTIC_TOLERANT_PHP_PARSER. The main difference is that when these values are specified, they will instantiate a LoggedDefinitionResolver or LoggedTolerantDefinitionResolver instead of DefinitionResolver or TolerantDefinitionResolver

TODO

  • static property rather than global (and update all other scripts / files)

LoggedDefinitionResolverTrait.php [added], LoggedDefinitionResolver.php [added], LoggedTolerantDefinitionResolver.php [added]

LoggedDefinitionResolver and LoggedTolerantDefinitionResolver are basic wrappers around DefinitionResolver and TolerantDefinitionResolver to make it easier to debug the application without explicitly attaching a debugger, and the actual logging logic is defined in LoggedDefinitionResolverTrait. Currently, logs are simply printed out to the console, but in the future we could implement some more advanced logging.

The logs themselves are pretty basic - they intercept the primary method calls, and log their corresponding signatures, input parameters, and returned results. In addition, they are indented according to their current level of recursion.

In the future, it might be helpful to include more granular logging, but this has been sufficient for now (note that there would also be a performance impact to keeping this logging on all the time).

There are also a few logging options worth mentioning (the way they are implemented is currently pretty hacky and specific to a particular problem I was investigating, but this can be revisited). maxRecursion keeps track of the max recursion level (an interesting statistic for diagnosing issues - could also consider using a formal cap on this to prevent CPU spinning out of control). repeat number of times a method should be repeated - I used this to diagnose some issues when I realized that the XDebug profiler results were absolutely useless for this application (by repeating a method call many times, I was able to get more accurate time stamps when analyzing perf. issues). times keeps track of the timing profiles

TODO

  • rename Logged -> Diagnostic
  • some object types are being rendered as UNKNOWN - figure out if there's a consistent way to render these
  • consider removing or changing representation of maxRecursion, repeat, and times properties
  • update or remove printLogs

ReferencesCollector.php [modified], DefinitionCollector.php [modified]

These files have been updated to correspond with the new DefinitionResolver interfaces

TODO

  • refer to definition resolver rather than FqnUtilities

FqnUtilities.php [added]

For holding some common logic between the two definition resolvers.

TODO

  • remove getDefinedFqn (and update references to refer to definition resolver instead)
  • revisit definition resolver logic, and determine additional candidates to pull into this shared set of utilities
  • rename to DefinitionResolutionUtilities

Location.php [modified], Range.php [modified]

Modified to support creating locations and ranges from both node types.

TODO

  • Extract Range::fromNode and Range::fromLocation DefinitionResolutionUtilities file
  • consider returning better representation from parser

TolerantSymbolInformation.php [added]

TODO

  • replace getFirstAncestor call with call to TolerantParserHelpers::tryGetPropertydeclaration

TolerantParserHelpers.php [added]

A temporary class to assist in answering simple questions about the AST. Eventually these operations will be rendered redundant by the parser.

TODO

  • update parser APIs to better represent these operations instead
  • better naming form methods (e.g. tryGetPropertyDeclarationFromVariable)

TreeAnalyzerInterface.php [added], TreeAnalyzer.php [added], TolerantTreeAnlayzer.php [added]

Central interface for performing definition / reference collection. One thing that's worth mentioning is that - for performance reasons, I didn't end up taking advantage of the built-in Node API of the tolerant parser, but will revisit this decision.

TODO

  • better naming
  • review and clean up logic
  • revist parser API around this

TextDocument.php [modified]

Responsible for handling TextDocument requests from client and server (e.g. definition, references). Updated logic to properly handle new node types and definition resolver interface

TODO

  • review and clean up logic
  • refactor to more clearly reflect old/new parser code paths

Tests/* [modified]

Many of the tests have been updated to account for updated node locations. Unfortunatley this breaks tests for the first parser.

TODO

  • consider a way to not break tests for the other parser
  • review modfied tests for correctness
  • consider clearer testing strategy

ValidationTest.php [added], broken/*.php [added]

Validation tests used to validate the parser against the old parser. broken/*.php are cases that were at some point an issue during the validation run (but not during the main run)

TODO

  • majorly update validation test logic (currently super hacky)
  • revisit and reorganize broken/*.php tests, better naming all around
  • consider converting validation tests that compare one parser against another to tests that compare the results of the parser against the old results of the parser on the same set of files to better detect regressions
  • add more submodules
  • update travis run to include tests
  • update phpunit.xml to include multiple test groups

- add diagnostics for old parser
- include maxRecursion levels
- include option to run functions multiple times to help profile
@felixfbecker
Copy link
Owner

Cool, I assumed there were some extra steps needed to make the submodules work in Travis, but looks fine!

@roblourens
Copy link
Contributor

At the moment, the submodules would only be used for the Performance test. ValidationTest only runs on the individual testcases in the cases/ folder (previously, it ran on the submodules comparing the result to the old parser).

@roblourens
Copy link
Contributor

I will file a whole bunch of issues shortly. Are you waiting for that before merging? Or also the latest comment #357 (comment)?

@felixfbecker
Copy link
Owner

I was going to fix #357 (comment) before merging, but if you want to do it quick feel free to do so

@roblourens
Copy link
Contributor

Those conditions are a little gnarly... it's not pretty but it will work the same now

@felixfbecker
Copy link
Owner

Simplified the condition further :)

felixfbecker
felixfbecker previously approved these changes Jun 9, 2017
Copy link
Owner

@felixfbecker felixfbecker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LETS DO THIS

@felixfbecker felixfbecker changed the title Adopt Microsoft/tolerant-php-parser [Work in Progress] Adopt Microsoft/tolerant-php-parser Jun 9, 2017
Performance.php Outdated
}


$parserKinds = [ParserKind::PHP_PARSER, ParserKind::TOLERANT_PHP_PARSER];
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does ParserKind still exist?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Damnit, one sec :)

@felixfbecker
Copy link
Owner

IT'S DONE 🎉
Thanks to everyone who contributed to this! This is a milestone

@jens1o
Copy link
Contributor

jens1o commented Jun 9, 2017

Thanks!... is that even a word? :/

@roblourens
Copy link
Contributor

@felixfbecker It might be useful to have the history of this PR, since so much happened, instead of squashing? I understand wanting a clean history, but git blame will be more useful if we can see all the commit messages. Up to you.

@jens1o
Copy link
Contributor

jens1o commented Jun 10, 2017

I agree with @roblourens

@felixfbecker
Copy link
Owner

@roblourens I didn't want to have all the cleanup commits in there, but doing a manual rebase sounded like a lot of work... You are right but I don't think it's worth force pushing master now

@felixfbecker
Copy link
Owner

I usually only rebase+merge or merge with merge commit if every commit passes tests on its own and could be reverted on its own, which would have been hard to ensure for all the commits in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Microsoft/tolerant-php-parser adoption
5 participants