WIP - Dependency Tree Matcher (Validations + Semgrex operators) #2836

skrcode · 2018-10-09T23:15:15Z

Added validation and all major Semgrex operators

Description

Added checks to validate input.
Added all major Semgrex operator support. Includes support for'<','>','>>','<<','.','$+','$-','$++','$--'.
Details regarding the operators can be found here : https://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/semgraph/semgrex/SemgrexPattern.html
Note: The subtree matching code needs to be re-looked at. I guess there are some caveats in the Semgrex pattern matching algorithm that need to be implemented. Unordered subtree matching takes exponential time and the heuristics implemented in Semgrex can be of help.

Types of change

Enhancement

Checklist

I have submitted the spaCy Contributor Agreement.
I ran the tests, and all new and existing tests passed.
My changes don't require a change to the documentation, or if they do, I've added all required information.

skrcode · 2018-10-30T07:33:41Z

@honnibal There's definitely something better than can be done on the pattern matching algorithm. I've hit a roadblock in trying to find a combinatorial approach for calculating all matches correctly and quickly. I've gone through a lot of the literature on this as well as existing code modules and could not find anything suitable that we could extend on to. Any thoughts on this?

syllog1sm · 2018-10-30T10:26:38Z

I'll spend some time on it. I wanted to merge anyway because the patch doesn't interfere with others so there's no reason to let it diverge.

Thanks a lot for your work on this, i'm really grateful for the research. I'm hoping that once we have a few real patterns we can optimise the empirical runtime, which might look different from the worst case complexity.

skrcode · 2018-11-02T10:49:06Z

@honnibal Yeah, I agree ! I think even http://corenlp.run/ does not handle those exponential cases.
While trying out this example {} > {} > {} > {} > {} > {} > {} > {} > {} > {} > {} > {} > {} > {} > {} > {}
for the text The quick brown fox jumped over the lazy dog., though the length of the input is small,
a large number of matches could be possibly found leading to a timeout or too much memory consumption.

skrcode · 2019-03-23T12:47:25Z

@honnibal I've rewritten some of the code here: #3465
I've thought about the above performance problem in depth and figured out that for atleast some input specification, there are going to be a large number(exponential) of output subtrees. For example, if the input pattern is something like "give me all subtree matchings with any node having any two children", then the result would obviously have to contain all subtrees(without ordering too) from the dependency parse containing a node with two children. This result in itself would be exponential and so it has to run in atleast exponential time and memory footprint. I feel that there isn't any significant performance improvement that can be theoretically done, although I've added some constant time optimizations that could perform fairly well in practical use-cases, considering that an average sentence length would be some number hovering around 10.

Added more constucts for dependency tree matcher

0421367

skrcode force-pushed the feat/deptree branch from d3d99fb to 0421367 Compare October 9, 2018 23:16

ines added enhancement Feature requests and improvements ⚠️ wip Work in progress feat / matcher Feature: Token, phrase and dependency matcher labels Oct 9, 2018

ines mentioned this pull request Oct 21, 2018

💫 Better, faster and more customisable matcher #1971

Closed

5 tasks

honnibal merged commit 0bf1408 into explosion:develop Oct 29, 2018

svlandeg mentioned this pull request Oct 11, 2019

DependencyMatcher documentation #4433

Closed

skrcode mentioned this pull request Jan 9, 2021

DependencyMatcher has exponential time complexity #6678

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP - Dependency Tree Matcher (Validations + Semgrex operators) #2836

WIP - Dependency Tree Matcher (Validations + Semgrex operators) #2836

skrcode commented Oct 9, 2018 •

edited

Loading

skrcode commented Oct 30, 2018

syllog1sm commented Oct 30, 2018

skrcode commented Nov 2, 2018 •

edited

Loading

skrcode commented Mar 23, 2019

WIP - Dependency Tree Matcher (Validations + Semgrex operators) #2836

WIP - Dependency Tree Matcher (Validations + Semgrex operators) #2836

Conversation

skrcode commented Oct 9, 2018 • edited Loading

Description

Types of change

Checklist

skrcode commented Oct 30, 2018

syllog1sm commented Oct 30, 2018

skrcode commented Nov 2, 2018 • edited Loading

skrcode commented Mar 23, 2019

skrcode commented Oct 9, 2018 •

edited

Loading

skrcode commented Nov 2, 2018 •

edited

Loading