Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support custom token/lexeme attribute for vectors #12625

Merged
merged 6 commits into from
Jun 28, 2023

Conversation

adrianeboyd
Copy link
Contributor

Description

Support custom token/lexeme attribute for vectors.

The main goal is to support LOWER or NORM for vector lookups instead of only ORTH. Because this attribute should be determined by the vectors themselves (primarily how the data was normalized for training) rather than the model config, the attribute for model vector lookup is moved from StaticVectors to a property of the Vectors.

Because a Token can have a custom NORM that overrides the Lexeme NORM, Token.get_struct_attr is used to retrieve the attr value in contexts where a Token is available.

Types of change

Enhancement.

Checklist

  • I confirm that I have the right to submit this contribution under the project's MIT license.
  • I ran the tests, and all new and existing tests passed.
  • My changes don't require a change to the documentation, or if they do, I've added all required information.

@adrianeboyd adrianeboyd added enhancement Feature requests and improvements feat / vectors Feature: Word vectors and similarity v3.6 Related to v3.6 labels May 12, 2023
@adrianeboyd adrianeboyd marked this pull request as draft May 12, 2023 06:07
@adrianeboyd adrianeboyd marked this pull request as ready for review May 22, 2023 08:38
@honnibal
Copy link
Member

All makes sense and looks good 👍

@adrianeboyd adrianeboyd mentioned this pull request Jun 27, 2023
3 tasks
@svlandeg svlandeg merged commit fb0da3e into explosion:master Jun 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests and improvements feat / vectors Feature: Word vectors and similarity v3.6 Related to v3.6
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants