Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
When evaluating MS-MARCO/other datasets, I think it is better to score all samples regardless of whether or not the sample has a label. They could not have labels for instance if it failed to retrieve relevant passage in the first round of ranking. Setting recall with zero_division equivalent to 0 instead of 1 should score this right (also consistent with msmarco_eval.py).
This issue also fixes a bug with the T5 decoder. Since we want the output to be over the T/F label distribution of the first decoder output, and not the second (which is what the code was previously doing). Note that the way that was done is not all that bad in terms of performance surprisingly. As a result of this fix, T5 outperforms all other models in all regards in CovidQA.
natural question:
precision@1 0.27419354838709675
recall@3 0.43502304147465437
recall@50 0.9305683563748081
recall@1000 1.0
mrr 0.4224002621206025
mrr@10 0.4097638248847927
keyword question:
precision@1 0.24193548387096775
recall@3 0.36378648233486943
recall@50 0.9230414746543779
recall@1000 1.0
mrr 0.38249784501639117
mrr@10 0.3701228878648234
Better performance than old version but surprisingly close like I said earlier!