Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DependencyMatcher documentation #4433

Closed
fabio-reale opened this issue Oct 11, 2019 · 11 comments
Closed

DependencyMatcher documentation #4433

fabio-reale opened this issue Oct 11, 2019 · 11 comments
Labels
docs Documentation and website feat / matcher Feature: Token, phrase and dependency matcher help wanted Contributions welcome!

Comments

@fabio-reale
Copy link

I would like to use the DependencyMatcher, which I learned to exist reading this issue. There, I also learned there is no documentation for it.

I figured I would learn it from the code and testing, which I might end up doing, but I also figured I could try and ask for some help. I'm having some difficulty understanding what all of the operators are supposed to do (these ones for example: ">>", ".", "$+").

Once I get it, I fully intend to help with this documentation the best I can. So, any help or advice about either the DependencyMatcher or how to contribute to documentation are welcome

Which page or section is this issue related to?

I assume the correct place for this documentation would be rule based matching page

@svlandeg svlandeg added docs Documentation and website feat / matcher Feature: Token, phrase and dependency matcher labels Oct 11, 2019
@svlandeg
Copy link
Member

Hi @fabio-reale : it would be awesome to get some help to get this properly documented. I haven't used the DependencyMatcher myself yet either, but I'd be happy to dig through this code together with you.

As I understand it, the different operators refer to the kind of grammatical relations that can exist between tokens. E.g. >, referring to gov, refers to all the children of a node, while >>, referring to gov_chain, refers to the full subtree of a node. You can find the definitions of e.g. subtree in token.pyx, which is what doc[node] refers to (a doc is a list of tokens).

@svlandeg
Copy link
Member

svlandeg commented Oct 11, 2019

I also saw that there's a dependency_matcher pytest fixture defined here, which could be useful to look into as a first example. I agree with you that the patterns are a little hard to read with those various operators. Maybe there is a way to simplify those or make it more intuitive.

We should probably also look into the reason why test_dependency_matcher (in that same file) has been commented out.

A little bit more background is here : #2836 and #3465

@skrcode
Copy link
Contributor

skrcode commented Oct 12, 2019

@fabio-reale You could have a look at these pull requests and go over the referenced links within them.

  1. WIP - Dependency Tree Pattern Matcher #2732
  2. WIP - Dependency Tree Matcher (Validations + Semgrex operators) #2836
  3. Dependency tree pattern matcher #3465
    These would help clarify almost all of the questions that you may have with regard to the DependencyMatcher functionality. You could have a look at this https://nlp.stanford.edu/software/tregex.html to be more informed about the associated theory.

@svlandeg Yes, we would need to get this documented. We would be requiring a few real world examples to use and help concretize everything.
test_dependency_matcher was commented out due to inconsistencies that I had faced with regard to Spacy Matcher used internally. I haven't tried using it recently, so we could probably un-comment it out and see if everything works perfectly now.

@svlandeg
Copy link
Member

Hi @skrcode! Do you have some real-life examples yourself? I think it would be great to get this documented because I'm sure a lot of people could use this functionality. We should think of some example cases to include in the docs, and include the same cases in the test suite.

Over the past few months, we've been fixing quite a few issues with the Matcher, so hopefully the inconsistencies you mentioned, should be resolved now.

@skrcode
Copy link
Contributor

skrcode commented Oct 12, 2019

@svlandeg Unfortunately, I do not have any real-life examples for this; although probably samples from Semgrex could work just as fine for the purposes of documentation. Some analyses of run-time, memory usage and correctness would also be required and the real-life examples would help out to a great extent here. I think that @cyclecycle was using this functionality and could probably be able to give a much better idea on how it has been faring so far.

@fabio-reale fabio-reale changed the title DepencyMatcher documentation DependencyMatcher documentation Oct 14, 2019
@fabio-reale
Copy link
Author

Hi @svlandeg and @skrcode,

Thanks for both your responses, they are being useful in understanding the DependencyMatcher. I'll write some tests to make sure I understand it well enough before trying to write any documentation for it.

About real-life examples, I might come up with a few, but they would all be uses for the Portuguese language.

@p-sodmann
Copy link
Contributor

I believe I can actually use the DependencyMatcher in a project I am currently working on.
In my case the texts are German medical notes, but there should be a possibility to find some English examples as well.

@ines ines mentioned this issue Nov 7, 2019
3 tasks
@svlandeg svlandeg added the help wanted Contributions welcome! label Mar 24, 2020
@svlandeg
Copy link
Member

svlandeg commented Mar 24, 2020

@DeNeutoy has recently created this whole blog post on the dependency matcher: http://markneumann.xyz/blog/dependency-matcher/

It would be great if we can get this distilled into a version for the spaCy docs. PRs welcome!

@skrcode
Copy link
Contributor

skrcode commented Mar 25, 2020

@svlandeg @DeNeutoy That looks awesome !

@svlandeg
Copy link
Member

The upcoming spaCy v3, currently available as spacy-nightly, finally officially supports the DependencyMatcher! @adrianeboyd has redesigned the pattern specifications and fixed & extended operator implementations (PR #6018).

The documentation is here: https://nightly.spacy.io/usage/rule-based-matching#dependencymatcher, and will be moved to the main docs once v3.0 is out officially!

@github-actions
Copy link
Contributor

This thread has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 28, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
docs Documentation and website feat / matcher Feature: Token, phrase and dependency matcher help wanted Contributions welcome!
Projects
None yet
Development

No branches or pull requests

4 participants