-
Notifications
You must be signed in to change notification settings - Fork 26.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enabling Tapex
in table question answering pipeline.
#16663
Conversation
) | ||
self.type = "tapas" if hasattr(self.model.config, "aggregation_labels") else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wondering if we can leverage model.config.is_encoder_decoder
here (which is True for TAPEX but False for TAPAS)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually I would agree that reusing more "general" concepts is better.
The thing I want to avoid is treating not encoder-decoder
as tapas
. currently the code for Tapas is super highly specific, so I'd rather isolate tapas
than isolate tapex
if you want.
Ultimately both are OK tbh.
if truncation is None: | ||
if self.type == "tapas": | ||
truncation = "drop_rows_to_fit" | ||
else: | ||
truncation = "do_not_truncate" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TAPEX also supports the "drop_rows_to_fit" truncation strategy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tokenizer
said it didn't and gave me the regular list.
Is the tokenizer
that gets resolved a BartTokenizer
maybe ? (I have seen BartTokenizer
in some random error log, I discarded it, but maybe ti's a True bug.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("microsoft/tapex-large-finetuned-wtq")
tokenizer.encode(table, truncation="drop_rows_to_fit")
ValueError: drop_rows_to_fit is not a valid TruncationStrategy, please select one of ['only_first', 'only_second', 'longest_first', 'do_not_truncate']
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for reporting, investigating.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update: you can continue with this, TAPEX doesn't support drop_rows_to_fit
in a similar manner as TAPAS does.
The documentation is not available anymore as the PR was closed or merged. |
Gentle ping @NielsRogge if we want to go live next week. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on it, looks good to me!
…6663) * Enabling `Tapex` in table question answering pipeline. * Questions are independant for Tapex, making the test respect that. * Missing extra space.
What does this PR do?
Tapex
models in the pipelineself.type
flag, sinceTapas
(the previously used models)have some non trivial defaults which do not apply to
Tapex
.(Ideally thosetype
are not model specific like
tapas
butencoder
vsencoder-decoder
whenapplicable.
model yet: https://huggingface.co/hf-internal-testing
@NielsRogge
@LysandreJik
Fixes # (issue)
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.