Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Commit

Permalink
updates
Browse files Browse the repository at this point in the history
  • Loading branch information
epwalsh committed Sep 11, 2020
2 parents 455ac1f + 2d7e1f6 commit 48bceca
Show file tree
Hide file tree
Showing 35 changed files with 428 additions and 120 deletions.
3 changes: 1 addition & 2 deletions .github/workflows/main.yml
Original file line number Diff line number Diff line change
Expand Up @@ -111,8 +111,7 @@ jobs:
configs:
name: Training Configs
# Don't run for forks, and only run for master pushes and on schedule.
if: github.repository == 'allenai/allennlp-models' && github.event_name != 'pull_request'
if: github.repository == 'allenai/allennlp-models'
runs-on: [self-hosted]

steps:
Expand Down
24 changes: 24 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,30 @@ jobs:
run: |
make docker-test-run DOCKER_TAG=$DOCKER_TAG ARGS='gpu-test'
configs:
name: Training Configs
if: github.repository == 'allenai/allennlp-models'
runs-on: [self-hosted]

steps:
- uses: actions/checkout@v2

- name: Set Docker tag
run: |
if [[ $GITHUB_EVENT_NAME == 'release' ]]; then
echo "::set-env name=DOCKER_TAG::${GITHUB_REF#refs/tags/}";
else
echo "::set-env name=DOCKER_TAG::$GITHUB_SHA";
fi
- name: Build test image
run: |
make docker-test-image DOCKER_TAG=$DOCKER_TAG
- name: Validate training configs
run: |
make docker-test-run DOCKER_TAG=$DOCKER_TAG ARGS='test-configs'
# Builds the API documentation and pushes it to the appropriate folder in the
# allennlp-docs repo.
docs:
Expand Down
15 changes: 13 additions & 2 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,14 +7,25 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

## Unreleased

### Changed
### Fixed

- Fixed BART for latest `transformers` version.

## [v1.1.0](https:/allenai/allennlp-models/releases/tag/v1.1.0) - 2020-09-08

### Fixed

- Updated `LanguageModelTokenEmbedder` to allow allow multiple token embedders, but only use first with non-empty type
- Fixed evaluation of metrics when using distributed setting.
- Fixed a bug introduced in 1.0 where the SRL model did not reproduce the original result.

- Updated dataset readers for new API: https:/allenai/allennlp/pull/4497.
## [v1.1.0rc4](https:/allenai/allennlp-models/releases/tag/v1.1.0rc4) - 2020-08-21

### Added

- Added regression tests for training configs that run on a scheduled workflow.
- Added a test for the pretrained sentiment analysis model.
- Added way for questions from quora dataset to be concatenated like the sequences in the SNLI dataset.

## [v1.1.0rc3](https:/allenai/allennlp-models/releases/tag/v1.1.0rc3) - 2020-08-12

Expand Down
5 changes: 2 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ typecheck :

.PHONY : test
test :
pytest --color=yes -rf --durations=40 -m "not pretrained_model_test" -m "not pretrained_config_test"
pytest --color=yes -rf --durations=40 -m "not pretrained_model_test and not pretrained_config_test"

.PHONY : gpu-test
gpu-test :
Expand All @@ -67,8 +67,7 @@ gpu-test :
.PHONY : test-with-cov
test-with-cov :
pytest --color=yes -rf --durations=40 \
-m "not pretrained_model_test" \
-m "not pretrained_config_test" \
-m "not pretrained_model_test and not pretrained_config_test" \
--cov-config=.coveragerc \
--cov=allennlp_models/ \
--cov-report=xml
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@ def forward(
A tensor of shape `(batch_size, num_classes)` representing a
distribution over the label classes for each instance.
- `loss` (`torch.FloatTensor`, optional) :
A scalar loss to be optimised. """
A scalar loss to be optimised."""
text_mask = util.get_text_field_mask(tokens)
# Pop elmo tokens, since elmo embedder should not be present.
elmo_tokens = tokens.pop("elmo", None)
Expand Down
5 changes: 4 additions & 1 deletion allennlp_models/coref/dataset_readers/winobias.py
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,10 @@ class WinobiasReader(DatasetReader):
"""

def __init__(
self, max_span_width: int, token_indexers: Dict[str, TokenIndexer] = None, **kwargs,
self,
max_span_width: int,
token_indexers: Dict[str, TokenIndexer] = None,
**kwargs,
) -> None:
super().__init__(**kwargs)
self._max_span_width = max_span_width
Expand Down
22 changes: 20 additions & 2 deletions allennlp_models/coref/metrics/mention_recall.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@
from overrides import overrides

import torch
import torch.distributed as dist

from allennlp.common.util import is_distributed

from allennlp.training.metrics.metric import Metric

Expand All @@ -18,14 +21,29 @@ def __call__(
batched_top_spans: torch.Tensor,
batched_metadata: List[Dict[str, Any]],
):
num_gold_mentions = 0
num_recalled_mentions = 0
for top_spans, metadata in zip(batched_top_spans.tolist(), batched_metadata):

gold_mentions: Set[Tuple[int, int]] = {
mention for cluster in metadata["clusters"] for mention in cluster
}
predicted_spans: Set[Tuple[int, int]] = {(span[0], span[1]) for span in top_spans}
self._num_gold_mentions += len(gold_mentions)
self._num_recalled_mentions += len(gold_mentions & predicted_spans)

num_gold_mentions += len(gold_mentions)
num_recalled_mentions += len(gold_mentions & predicted_spans)

if is_distributed():
device = batched_top_spans.device
_num_gold_mentions = torch.tensor(num_gold_mentions).to(device)
_num_recalled_mentions = torch.tensor(num_recalled_mentions).to(device)
dist.all_reduce(_num_gold_mentions, op=dist.ReduceOp.SUM)
dist.all_reduce(_num_recalled_mentions, op=dist.ReduceOp.SUM)
num_gold_mentions = _num_gold_mentions.item()
num_recalled_mentions = _num_recalled_mentions.item()

self._num_gold_mentions += num_gold_mentions
self._num_recalled_mentions += num_recalled_mentions

@overrides
def get_metric(self, reset: bool = False) -> float:
Expand Down
14 changes: 10 additions & 4 deletions allennlp_models/generation/models/bart.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,11 @@
@Seq2SeqEncoder.register("bart_encoder")
class BartEncoder(Seq2SeqEncoder):
"""
The BART encoder without the token and position embeddings.
The BART encoder, implemented as a `Seq2SeqEncoder`, which assumes it operates on
already embedded inputs. This means that we remove the token and position embeddings
from BART in this module. For the typical use case of using BART to encode inputs to your
model (where we include the token and position embeddings from BART), you should use
`PretrainedTransformerEmbedder(bart_model_name, sub_module="encoder")` instead of this.
# Parameters
Expand Down Expand Up @@ -93,7 +97,9 @@ def __init__(

@overrides
def forward(
self, input_ids, attention_mask=None,
self,
input_ids,
attention_mask=None,
):
x = self.embed_tokens(input_ids) + self.embed_positions(input_ids)
encoder_states = self.encoder(x, attention_mask)
Expand Down Expand Up @@ -338,7 +344,7 @@ def take_step(
attention_mask=state["input_mask"],
encoder_outputs=encoder_outputs,
decoder_input_ids=last_predictions[:, : i + 1],
decoder_cached_states=decoder_cache,
past_key_values=decoder_cache,
generation_mode=True,
use_cache=True,
)
Expand All @@ -353,7 +359,7 @@ def take_step(
dim=-1, index=idx
)

decoder_cache = outputs[1][1]
decoder_cache = outputs[1]

state["encoder_states"] = outputs[2]

Expand Down
15 changes: 11 additions & 4 deletions allennlp_models/generation/models/simple_seq2seq.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,7 +98,8 @@ def __init__(
self.vocab._padding_token, self._target_namespace
)
self._bleu = BLEU(
bleu_ngram_weights, exclude_indices={pad_index, self._end_index, self._start_index},
bleu_ngram_weights,
exclude_indices={pad_index, self._end_index, self._start_index},
)
else:
self._bleu = None
Expand Down Expand Up @@ -154,7 +155,9 @@ def __init__(
# TODO (pradeep): Do not hardcode decoder cell type.
if self._target_decoder_layers > 1:
self._decoder_cell = LSTM(
self._decoder_input_dim, self._decoder_output_dim, self._target_decoder_layers,
self._decoder_input_dim,
self._decoder_output_dim,
self._target_decoder_layers,
)
else:
self._decoder_cell = LSTMCell(self._decoder_input_dim, self._decoder_output_dim)
Expand Down Expand Up @@ -295,7 +298,9 @@ def _init_decoder_state(self, state: Dict[str, torch.Tensor]) -> Dict[str, torch
batch_size = state["source_mask"].size(0)
# shape: (batch_size, encoder_output_dim)
final_encoder_output = util.get_final_encoder_states(
state["encoder_outputs"], state["source_mask"], self._encoder.is_bidirectional(),
state["encoder_outputs"],
state["source_mask"],
self._encoder.is_bidirectional(),
)
# Initialize the decoder hidden state with the final output of the encoder.
# shape: (batch_size, decoder_output_dim)
Expand Down Expand Up @@ -504,7 +509,9 @@ def _prepare_attended_input(

@staticmethod
def _get_loss(
logits: torch.LongTensor, targets: torch.LongTensor, target_mask: torch.BoolTensor,
logits: torch.LongTensor,
targets: torch.LongTensor,
target_mask: torch.BoolTensor,
) -> torch.Tensor:
"""
Compute loss.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -94,5 +94,5 @@ def forward(
Tuple[Dict[str, torch.Tensor], torch.Tensor]
Tuple of new decoder state and decoder output. Output should be used to generate out sequence elements
"""
"""
raise NotImplementedError()
Original file line number Diff line number Diff line change
Expand Up @@ -401,7 +401,9 @@ def get_metrics(self, reset: bool = False) -> Dict[str, float]:

@overrides
def forward(
self, encoder_out: Dict[str, torch.LongTensor], target_tokens: TextFieldTensors = None,
self,
encoder_out: Dict[str, torch.LongTensor],
target_tokens: TextFieldTensors = None,
) -> Dict[str, torch.Tensor]:
state = encoder_out
decoder_init_state = self._decoder_net.init_decoder_state(state)
Expand All @@ -427,16 +429,15 @@ def forward(
# shape: (batch_size, max_predicted_sequence_length)
best_predictions = top_k_predictions[:, 0, :]

self._tensor_based_metric( # type: ignore
best_predictions, targets
)
self._tensor_based_metric(best_predictions, targets) # type: ignore

if self._token_based_metric is not None:
output_dict = self.post_process(output_dict)
predicted_tokens = output_dict["predicted_tokens"]

self._token_based_metric( # type: ignore
predicted_tokens, self.indices_to_tokens(targets[:, 1:]),
predicted_tokens,
self.indices_to_tokens(targets[:, 1:]),
)

return output_dict
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -63,14 +63,14 @@ def forward(
target_tokens : `Dict[str, torch.LongTensor]`, optional
The output of `TextField.as_array()` applied on the target `TextField`.
"""
"""

raise NotImplementedError()

def post_process(self, output_dict: Dict[str, torch.Tensor]) -> Dict[str, torch.Tensor]:
"""
Post processing for converting raw outputs to prediction during inference.
The composing models such `allennlp.models.encoder_decoders.composed_seq2seq.ComposedSeq2Seq`
can call this method when `decode` is called.
Post processing for converting raw outputs to prediction during inference.
The composing models such `allennlp.models.encoder_decoders.composed_seq2seq.ComposedSeq2Seq`
can call this method when `decode` is called.
"""
raise NotImplementedError()
5 changes: 4 additions & 1 deletion allennlp_models/lm/dataset_readers/masked_language_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,10 @@ class MaskedLanguageModelingReader(DatasetReader):
"""

def __init__(
self, tokenizer: Tokenizer = None, token_indexers: Dict[str, TokenIndexer] = None, **kwargs,
self,
tokenizer: Tokenizer = None,
token_indexers: Dict[str, TokenIndexer] = None,
**kwargs,
) -> None:
super().__init__(**kwargs)
self._tokenizer = tokenizer or WhitespaceTokenizer()
Expand Down
5 changes: 4 additions & 1 deletion allennlp_models/lm/dataset_readers/next_token_lm.py
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,10 @@ class NextTokenLMReader(DatasetReader):
"""

def __init__(
self, tokenizer: Tokenizer = None, token_indexers: Dict[str, TokenIndexer] = None, **kwargs,
self,
tokenizer: Tokenizer = None,
token_indexers: Dict[str, TokenIndexer] = None,
**kwargs,
) -> None:
super().__init__(**kwargs)
self._tokenizer = tokenizer or WhitespaceTokenizer()
Expand Down
6 changes: 2 additions & 4 deletions allennlp_models/lm/models/language_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ class LanguageModel(Model):
`Seq2SeqEncoder` to uncontextualized embeddings, using a `SoftmaxLoss`
module (defined above) to compute the language modeling loss.
If bidirectional is True, the language model is trained to predict the next and
If bidirectional is True, the language model is trained to predict the next and
previous tokens for each token in the input. In this case, the contextualizer must
be bidirectional. If bidirectional is False, the language model is trained to only
predict the next token for each token in the input; the contextualizer should also
Expand Down Expand Up @@ -213,9 +213,7 @@ def num_layers(self) -> int:
+ "does not report how many layers it has."
)

def forward( # type: ignore
self, source: TextFieldTensors
) -> Dict[str, torch.Tensor]:
def forward(self, source: TextFieldTensors) -> Dict[str, torch.Tensor]: # type: ignore
"""
Computes the averaged forward (and backward, if language model is bidirectional)
LM loss from the batch.
Expand Down
38 changes: 28 additions & 10 deletions allennlp_models/lm/modules/token_embedders/language_model.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,10 +2,14 @@
from typing import Dict, Tuple, TYPE_CHECKING

import torch
from allennlp.common import Params

from allennlp.common.checks import ConfigurationError
from allennlp.data import TokenIndexer, Token
from allennlp.modules import TextFieldEmbedder
from allennlp.modules.scalar_mix import ScalarMix
from allennlp.modules.text_field_embedders import BasicTextFieldEmbedder
from allennlp.modules.token_embedders import EmptyEmbedder
from allennlp.modules.token_embedders.token_embedder import TokenEmbedder
from allennlp.nn.util import (
remove_sentence_boundaries,
Expand All @@ -15,7 +19,7 @@

# Importing at runtime results in a circular dependency.
if TYPE_CHECKING:
from allennlp.models.language_model import LanguageModel
from allennlp_models.lm.models.language_model import LanguageModel


@TokenEmbedder.register("language_model_token_embedder")
Expand Down Expand Up @@ -76,15 +80,29 @@ def __init__(

# Extract the name of the tokens that the LM was trained on.
text_field_embedder = dict_config["model"]["text_field_embedder"]
token_names = list(text_field_embedder["token_embedders"].keys())
if len(token_names) != 1:
# We don't currently support embedding with language models trained with multiple
# embedded indices.
#
# Note: We only care about embedded indices. This does not include "tokens" which
# is just used to compute the loss in LanguageModel.
raise ConfigurationError(f"LM from {archive_file} trained with multiple embedders!")
self._token_name = token_names[0]
text_field_embedder = TextFieldEmbedder.from_params(Params(text_field_embedder))
if not isinstance(text_field_embedder, BasicTextFieldEmbedder):
raise ConfigurationError(
f"Language model from {archive_file} uses a non-standard TextFieldEmbedder!"
)
non_empty_embedders = [
name
for name, token_embedder in text_field_embedder._token_embedders.items()
if not isinstance(token_embedder, EmptyEmbedder)
]

if len(non_empty_embedders) == 0:
# Only empty embedders were contained in the language model
# We need at least one non-empty embedder in the language model
raise ConfigurationError(
f"Language model from {archive_file} trained with only empty embedders!"
)
elif len(non_empty_embedders) > 1:
raise ConfigurationError(
f"Language model from {archive_file} trained with multiple non-empty embedders!"
)

self._token_name = non_empty_embedders[0]

# TODO(brendanr): Find a way to remove this hack. The issue fundamentally is that the
# BasicTextFieldEmbedder concatenates multiple embedded representations. When a
Expand Down
Loading

0 comments on commit 48bceca

Please sign in to comment.