Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Commit

Permalink
Fix RoBERTa SST (#110)
Browse files Browse the repository at this point in the history
* Only make tokens when we don't already have them

* Changelog
  • Loading branch information
dirkgr authored Aug 17, 2020
1 parent 0491690 commit 4fa5fc1
Show file tree
Hide file tree
Showing 2 changed files with 16 additions and 3 deletions.
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

- Fixed `GraphParser.get_metrics` so that it expects a dict from `F1Measure.get_metric`.
- `CopyNet` and `SimpleSeq2Seq` models now work with AMP.
- Made the SST reader a little more strict in the kinds of input it accepts.


## [v1.1.0rc2](https:/allenai/allennlp-models/releases/tag/v1.1.0rc2) - 2020-07-31

Expand Down
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
from typing import Dict, List, Optional
from typing import Dict, List, Optional, Union
import logging

from allennlp.data import Tokenizer
Expand Down Expand Up @@ -111,9 +111,20 @@ def text_to_instance(self, tokens: List[str], sentiment: str = None) -> Optional
label : `LabelField`
The sentiment label of the sentence or phrase.
"""

assert isinstance(
tokens, list
) # If tokens is a str, nothing breaks but the results are garbage, so we check.
if self._tokenizer is None:
tokens = [Token(x) for x in tokens]

def make_token(t: Union[str, Token]):
if isinstance(t, str):
return Token(t)
elif isinstance(t, Token):
return t
else:
raise ValueError("Tokens must be either str or Token.")

tokens = [make_token(x) for x in tokens]
else:
tokens = self._tokenizer.tokenize(" ".join(tokens))
text_field = TextField(tokens, token_indexers=self._token_indexers)
Expand Down

0 comments on commit 4fa5fc1

Please sign in to comment.