Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Zero div and T5 decoding #16

Merged
merged 1 commit into from
May 2, 2020
Merged

Fix Zero div and T5 decoding #16

merged 1 commit into from
May 2, 2020

Conversation

ronakice
Copy link
Member

@ronakice ronakice commented May 1, 2020

When evaluating MS-MARCO/other datasets, I think it is better to score all samples regardless of whether or not the sample has a label. They could not have labels for instance if it failed to retrieve relevant passage in the first round of ranking. Setting recall with zero_division equivalent to 0 instead of 1 should score this right (also consistent with msmarco_eval.py).

This issue also fixes a bug with the T5 decoder. Since we want the output to be over the T/F label distribution of the first decoder output, and not the second (which is what the code was previously doing). Note that the way that was done is not all that bad in terms of performance surprisingly. As a result of this fix, T5 outperforms all other models in all regards in CovidQA.

natural question:
precision@1 0.27419354838709675
recall@3 0.43502304147465437
recall@50 0.9305683563748081
recall@1000 1.0
mrr 0.4224002621206025
mrr@10 0.4097638248847927

keyword question:
precision@1 0.24193548387096775
recall@3 0.36378648233486943
recall@50 0.9230414746543779
recall@1000 1.0
mrr 0.38249784501639117
mrr@10 0.3701228878648234

Better performance than old version but surprisingly close like I said earlier!

@@ -30,7 +30,7 @@ def rerank(self, query: Query, texts: List[Text]) -> List[Text]:
attn_mask = batch.output['attention_mask']
_, batch_scores = greedy_decode(self.model,
input_ids.to(self.device),
length=2,
length=1,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't remember exactly why, but @daemon had a good reason to use length=2, so I will let him comment here

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason was to reproduce what the TensorFlow implementation had been doing. If using a single step of decoding improves quality, then that's all the better.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding a comment would be a good idea...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, that belongs in covidex.

@@ -75,7 +75,7 @@ def accumulate(self, scores: List[float], gold: RelevanceExample):
score_rels = self.truncated_rels(scores)
score_rels[score_rels != 0] = 1
gold_rels = np.array(gold.labels, dtype=int)
score = recall_score(gold_rels, score_rels, zero_division=1)
score = recall_score(gold_rels, score_rels, zero_division=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, that makes sense. @daemon, before we merge this, I just want to double-check with you if there was a reason for zero_division=1

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No objection.

@ronakice ronakice merged commit 3469584 into master May 2, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants