Skip to content
This repository has been archived by the owner on Dec 16, 2022. It is now read-only.

Commit

Permalink
Flickr30k (#285)
Browse files Browse the repository at this point in the history
* max instances for debugging

* b

* printing devices

* moving tensors?

* self

* p

* p

* l

* l

* fixing heap?

* stop logging and printing

* less prints

* printing devices

* p

* .

* devices

* device

* test

* testing not sampling

* testing not using model again

* test not moving tensors

* not printing

* trying image subset

* debugging model

* going back to full (model is slow?)

* right number of instances

* distribut

* more potential hard negatives

* non-distributed

* distributed + adding another seen set

* fixing evaluation method

* format

* testing fixed eval

* fixing variable

* fixing training var

* testing new eval again

* fix

* fix

* fix?

* changing k to 5

* float

* moving labels to gpu

* long

* trying hopefully fixed loss function

* fix

* testing out the whole thing

* setting max instances to debug in distributed

* debug stuff

* fixing num images

* hopefully fixing the dataset reader in dist

* full data

* testing out brand new changes

* deleting some old comments

* fixing validation bug

* testing on 1 gpu for now

* feature cache broken?

* switching to tensor fields and stuff

* fix

* print device

* trying to not move the batch?

* moving small batches to cpu

* not printing device

* deleting old tensor?

* debug

* printing memory allocation

* moving tensor to cpu immediately?

* deleting batch?

* debug

* debug

* debug

* does this work?

* switching to eval and no grad

* fix

* mask list

* backbone roll

* typo

* log

* debug

* notes

* testing no grad

* testing validation batch size of 1

* bug

* didn't have the right variable?

* don't need to softmax?

* trying flickr30k with 8 batch and dummy captions

* full flickr?

* batch size 1

* testing training batches for validation

* Testing out val stuff

* updating reader (test will fail for now)

* debug statements to figure out why val isn't worki

* testing if top images always have same scores

* getting rid of caption debugging step

* using the right caption var

* updating reader to mirror vilbert training setup

* full dataset (dummy caption embeddings)

* switching to real caption embeddings

* testing caching hard negatives

* log

* limit instances to test caching

* delete faiss

* more cache tests

* one more log statement

* single epoch to calculate hard negatives

* need to import logging

* don't log misses anymore (too slow)

* using consistent hash function (test # instances)

* Flickr30k batching (#277)

merge main + caching captions

* test caching captions and hard negatives on full

* don't log cache hits

* logging training labels to debug

* switching val to 4 way mc

* can we overfit

* not 1k instances

* not logging + overfit

* not overfit

* even fewer instances

* all instances

* even more overfitting

* back to normal

* b

* bkac to normal

* log loss and stuff again

* reset

* don't include hard negatives in case there's a bug

* batch size of 1

* more epochs

* only correct answer and hard negatives

* Cleanup

* Fix error in caption caching

* Find hard negatives even when we don't have enough instances

* O(1) algorithm for finding a random number with one exception

* Make sure the wrong caption comes from a different image

* Cross entropy loss

* trying overfitting with full instances

* use full dataset without learning rate scheduler

* don't limit instances and don't log

* batch size, scheduler, wandb

* comment out wandb

* full dataset no hard negatives

* don't log loss

* giving the correct answer a cheat word

* use local feature cache

* logging cache stuff

* different local feature cache dir

* switching to cheat box

* bug

* something up with some boxes

* no cheating and no hard negatives

* seeing is a really big batch size works

* bug

* testing 64 bs

* batch size 32

* batch size 48

* full training with 32 batch size no hard negatives

* more gradient accumulation steps

* trying to train with 10% of the data

* fix

* bumping up the learning rate, don't correct bias

* gradient accumulation + hard negatives

* use local feature cache

* changing params back

* trying real validation

* no hard negatives

* hard negatives and not real validation

* no hard negatives + real validation

* calc hn

* fixing predictors

* fix

* fix

* fix

* fix

* cleaning up PR (in progress)

* cleaning things up

* more cleanup

* change warmup steps

* only validate every ~5 epochs

* printing shapes

* more logging

* fix log

* try cat instead of stack

* different logging

* test

* fix

* try batches per epoch

* bug

* get rid of log statement

* use local feature cache

* log

* logging cache miss

* switching back to old captions to use cache

* switching back to preprocesing captions

* using nfs

* Disabling hard negatives to test epoch strat

* not logging cache misses

* write to local cache (faster)

* epoch multiplier

* no hard negatives

* hard negatives

* lowering number of warmup steps

* no hard negatives

* hard negatives

* no hard negatives

* hard negatives

* Trying Jiasen's featurizer (1x epoch mult)

* null image stuff

* null  image

* don't featurize captions (no hn)

* adding vilbert ir model tests

* cleanup + test distributed

* cleanup + dist

* test distributed

* don't use shard_iterable

* fix feature dir

* changelog

* reformat

* log shapes

* removing unused vars

* using old features

* style

* lint

* lint

* don't log shapes

* lint

* fixing type

* debug

* changing test files to hopefully fix test

* using cloud link for data dir

* cleanup

* delete print

* comment

* cleanup

* fixing test assert

* committing a bunch of fixes

* not distributed

* fixing metrics

* Adding test files + upping max instances

* fixes

* Switching back to nfs cache

* renaming n

* update comment

* fix

* making test deterministic?

* sorting files to hopefully achieve consistency

Co-authored-by: Dirk Groeneveld <[email protected]>
  • Loading branch information
jacob-morrison and dirkgr authored Jun 25, 2021
1 parent fb35b2d commit e47da99
Show file tree
Hide file tree
Showing 32 changed files with 1,178 additions and 2 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Added `StanfordSentimentTreeBankDatasetReader.apply_token_indexers()` to add token_indexers rather than in `text_to_instance`
- Added `AdversarialBiasMitigator` tests.
- Added `adversarial-binary-gender-bias-mitigated-roberta-snli` model.
- Added support for Flickr30k image retrieval, including a dataset reader, a model, and a training config.

### Fixed

Expand Down
1 change: 1 addition & 0 deletions allennlp_models/vision/dataset_readers/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,3 +4,4 @@
from allennlp_models.vision.dataset_readers.vgqa import VGQAReader
from allennlp_models.vision.dataset_readers.vqav2 import VQAv2Reader
from allennlp_models.vision.dataset_readers.visual_entailment import VisualEntailmentReader
from allennlp_models.vision.dataset_readers.flickr30k import Flickr30kReader
480 changes: 480 additions & 0 deletions allennlp_models/vision/dataset_readers/flickr30k.py

Large diffs are not rendered by default.

6 changes: 4 additions & 2 deletions allennlp_models/vision/dataset_readers/vision_reader.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,11 +96,13 @@ def __init__(
max_instances: Optional[int] = None,
image_processing_batch_size: int = 8,
write_to_cache: bool = True,
manual_distributed_sharding: bool = True,
manual_multiprocess_sharding: bool = True,
) -> None:
super().__init__(
max_instances=max_instances,
manual_distributed_sharding=True,
manual_multiprocess_sharding=True,
manual_distributed_sharding=manual_distributed_sharding,
manual_multiprocess_sharding=manual_multiprocess_sharding,
)

# tokenizers and indexers
Expand Down
1 change: 1 addition & 0 deletions allennlp_models/vision/models/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
from allennlp_models.vision.models.nlvr2 import Nlvr2Model
from allennlp_models.vision.models.vision_text_model import VisionTextModel
from allennlp_models.vision.models.visual_entailment import VisualEntailmentModel
from allennlp_models.vision.models.vilbert_image_retrieval import ImageRetrievalVilbert
from allennlp_models.vision.models.vilbert_vqa import VqaVilbert
from allennlp_models.vision.models.heads.vqa_head import VqaHead
from allennlp_models.vision.models.heads.visual_entailment_head import VisualEntailmentHead
138 changes: 138 additions & 0 deletions allennlp_models/vision/models/vilbert_image_retrieval.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
import logging
from typing import Dict

from overrides import overrides
import torch

from allennlp.data import TextFieldTensors, Vocabulary
from allennlp.models.model import Model
from allennlp.modules.transformer import (
TransformerEmbeddings,
ImageFeatureEmbeddings,
BiModalEncoder,
)
from allennlp.training.metrics import CategoricalAccuracy
from torch.nn import CrossEntropyLoss

from allennlp_models.vision.models.vision_text_model import VisionTextModel

logger = logging.getLogger(__name__)


@Model.register("vilbert_ir")
@Model.register("vilbert_ir_from_huggingface", constructor="from_huggingface_model_name")
class ImageRetrievalVilbert(VisionTextModel):
"""
Model for image retrieval task based on the VilBERT paper.
# Parameters
vocab : `Vocabulary`
text_embeddings : `TransformerEmbeddings`
image_embeddings : `ImageFeatureEmbeddings`
encoder : `BiModalEncoder`
pooled_output_dim : `int`
fusion_method : `str`, optional (default = `"mul"`)
dropout : `float`, optional (default = `0.1`)
label_namespace : `str`, optional (default = `answers`)
k: `int`, optional (default = `1`)
"""

def __init__(
self,
vocab: Vocabulary,
text_embeddings: TransformerEmbeddings,
image_embeddings: ImageFeatureEmbeddings,
encoder: BiModalEncoder,
pooled_output_dim: int,
fusion_method: str = "mul",
dropout: float = 0.1,
k: int = 1,
*,
ignore_text: bool = False,
ignore_image: bool = False,
) -> None:
super().__init__(
vocab,
text_embeddings,
image_embeddings,
encoder,
pooled_output_dim,
fusion_method,
dropout,
is_multilabel=False,
ignore_text=ignore_text,
ignore_image=ignore_image,
)
self.classifier = torch.nn.Linear(pooled_output_dim, 1)

self.top_1_acc = CategoricalAccuracy()
self.top_5_acc = CategoricalAccuracy(top_k=5)
self.top_10_acc = CategoricalAccuracy(top_k=10)
self.loss = CrossEntropyLoss()

self.k = k

@overrides
def forward(
self, # type: ignore
box_features: torch.Tensor,
box_coordinates: torch.Tensor,
box_mask: torch.Tensor,
caption: TextFieldTensors,
label: torch.Tensor,
) -> Dict[str, torch.Tensor]:
batch_size = box_features.shape[0]

if self.training:
# Shape: (batch_size, num_images, pooled_output_dim)
pooled_output = self.backbone(box_features, box_coordinates, box_mask, caption)[
"pooled_boxes_and_text"
]

# Shape: (batch_size, num_images)
logits = self.classifier(pooled_output).squeeze(-1)
probs = torch.softmax(logits, dim=-1)
else:
with torch.no_grad():
# Shape: (batch_size, num_images, pooled_output_dim)
pooled_output = self.backbone(box_features, box_coordinates, box_mask, caption)[
"pooled_boxes_and_text"
]

# Shape: (batch_size, num_images)
logits = self.classifier(pooled_output).squeeze(-1)
probs = torch.softmax(logits, dim=-1)

outputs = {"logits": logits, "probs": probs}
outputs = self._compute_loss_and_metrics(batch_size, outputs, label)
return outputs

@overrides
def _compute_loss_and_metrics(
self,
batch_size: int,
outputs: torch.Tensor,
labels: torch.Tensor,
):
outputs["loss"] = self.loss(outputs["logits"], labels) / batch_size
self.top_1_acc(outputs["logits"], labels)
self.top_5_acc(outputs["logits"], labels)
self.top_10_acc(outputs["logits"], labels)
return outputs

@overrides
def get_metrics(self, reset: bool = False) -> Dict[str, float]:
return {
"top_1_acc": self.top_1_acc.get_metric(reset),
"top_5_acc": self.top_5_acc.get_metric(reset),
"top_10_acc": self.top_10_acc.get_metric(reset),
}

@overrides
def make_output_human_readable(
self, output_dict: Dict[str, torch.Tensor]
) -> Dict[str, torch.Tensor]:
return output_dict

default_predictor = "vilbert_ir"
1 change: 1 addition & 0 deletions allennlp_models/vision/predictors/__init__.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,3 @@
from allennlp_models.vision.predictors.vilbert_ir import VilbertImageRetrievalPredictor
from allennlp_models.vision.predictors.vilbert_vqa import VilbertVqaPredictor
from allennlp_models.vision.predictors.visual_entailment import VisualEntailmentPredictor
40 changes: 40 additions & 0 deletions allennlp_models/vision/predictors/vilbert_ir.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
from typing import List, Dict

from overrides import overrides
import numpy

from allennlp.common.file_utils import cached_path
from allennlp.common.util import JsonDict
from allennlp.data import Instance
from allennlp.data.fields import LabelField
from allennlp.predictors.predictor import Predictor


@Predictor.register("vilbert_ir")
class VilbertImageRetrievalPredictor(Predictor):
def predict(self, image: str, caption: str) -> JsonDict:
image = cached_path(image)
return self.predict_json({"caption": caption, "image": image})

@overrides
def _json_to_instance(self, json_dict: JsonDict) -> Instance:
from allennlp_models.vision.dataset_readers.flickr30k import Flickr30kReader

caption = json_dict["caption"]
image = cached_path(json_dict["image"])
if isinstance(self._dataset_reader, Flickr30kReader):
return self._dataset_reader.text_to_instance(caption, image, use_cache=False)
else:
raise ValueError(
f"Dataset reader is of type f{self._dataset_reader.__class__.__name__}. "
f"Expected {Flickr30kReader.__name__}."
)

@overrides
def predictions_to_labeled_instances(
self, instance: Instance, outputs: Dict[str, numpy.ndarray]
) -> List[Instance]:
new_instance = instance.duplicate()
label = numpy.argmax(outputs["probs"])
new_instance.add_field("label", LabelField(int(label), skip_indexing=True))
return [new_instance]
80 changes: 80 additions & 0 deletions test_fixtures/vision/flickr30k/experiment.jsonnet
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
local model_name = "epwalsh/bert-xsmall-dummy";

{
"dataset_reader": {
"type": "flickr30k",
"image_dir": "test_fixtures/vision/images/flickr30k",
"data_dir": "test_fixtures/vision/flickr30k/sentences",
"image_loader": "torch",
"image_featurizer": "null",
"featurize_captions": false,
"region_detector": {
"type": "random",
"seed": 322
},
"tokenizer": {
"type": "pretrained_transformer",
"model_name": model_name
},
"token_indexers": {
"tokens": {
"type": "pretrained_transformer",
"model_name": model_name
}
}
},
"train_data_path": "test_fixtures/vision/flickr30k/tiny-dev.txt",
"validation_data_path": "test_fixtures/vision/flickr30k/tiny-dev.txt",
"model": {
"type": "vilbert_ir",
"text_embeddings": {
"vocab_size": 250,
"embedding_size": 20,
"pad_token_id": 0,
"max_position_embeddings": 512,
"type_vocab_size": 2,
"dropout": 0.0
},
"image_embeddings": {
"feature_size": 10,
"embedding_size": 200
},
"encoder": {
# text
"hidden_size1": 20,
"num_hidden_layers1": 1,
"intermediate_size1": 40,
"num_attention_heads1": 1,
"attention_dropout1": 0.1,
"hidden_dropout1": 0.1,
"biattention_id1": [0, 1],
"fixed_layer1": 0,

# vision
"hidden_size2": 200,
"num_hidden_layers2": 1,
"intermediate_size2": 50,
"num_attention_heads2": 1,
"attention_dropout2": 0.0,
"hidden_dropout2": 0.0,
"biattention_id2": [0, 1],
"fixed_layer2": 0,

"combined_num_attention_heads": 2,
"combined_hidden_size": 200,
"activation": "gelu",
},
"pooled_output_dim": 100,
"fusion_method": "sum",
},
"data_loader": {
"batch_size": 4
},
"trainer": {
"optimizer": {
"type": "huggingface_adamw",
"lr": 0.00005
},
"num_epochs": 1,
}
}
60 changes: 60 additions & 0 deletions test_fixtures/vision/flickr30k/experiment_from_huggingface.jsonnet
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
local model_name = "epwalsh/bert-xsmall-dummy";
{
"dataset_reader": {
"type": "flickr30k",
"image_dir": "test_fixtures/vision/images/flickr30k",
"data_dir": "test_fixtures/vision/flickr30k/sentences",
"image_loader": "torch",
"image_featurizer": "null",
"featurize_captions": false,
"region_detector": {
"type": "random",
"seed": 322
},
"tokenizer": {
"type": "pretrained_transformer",
"model_name": model_name
},
"token_indexers": {
"tokens": {
"type": "pretrained_transformer",
"model_name": model_name
}
}
},
"train_data_path": "test_fixtures/vision/flickr30k/tiny-dev.txt",
"validation_data_path": "test_fixtures/vision/flickr30k/tiny-dev.txt",
"model": {
"type": "vilbert_ir_from_huggingface",
"model_name": model_name,
"image_feature_dim": 10,
"image_num_hidden_layers": 1,
"image_hidden_size": 200,
"image_num_attention_heads": 1,
"image_intermediate_size": 50,
"image_attention_dropout": 0.0,
"image_hidden_dropout": 0.0,
"image_biattention_id": [0, 1],
"image_fixed_layer": 0,

"text_biattention_id": [0, 1],
"text_fixed_layer": 0,

"combined_hidden_size": 200,
"combined_num_attention_heads": 4,

"pooled_output_dim": 100,
"fusion_method": "sum",
"pooled_dropout": 0.0,
},
"data_loader": {
"batch_size": 32
},
"trainer": {
"optimizer": {
"type": "huggingface_adamw",
"lr": 0.00005
},
"num_epochs": 1,
}
}
5 changes: 5 additions & 0 deletions test_fixtures/vision/flickr30k/sentences/1.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[/EN#221796/people A girl] with [/EN#221804/bodyparts brown hair] sits on [/EN#221799/scene the edge of a cement area] [/EN#221798/scene overlooking water] .
[/EN#221796/people A woman] in [/EN#221797/clothing black] , seen from [/EN#221800/other behind] , sits next to [/EN#221798/scene a body of water] .
[/EN#221796/people A girl] sitting outside on [/EN#221799/other concrete] near [/EN#221798/scene water] in [/EN#221797/clothing a black dress] .
[/EN#221796/people A small girl] sits on [/EN#221799/other a ledge] by [/EN#221798/scene the water] contemplating [/EN#221802/other life] .
[/EN#221796/people A dark-haired girl] is sitting on [/EN#221798/scene the waters edge] .
5 changes: 5 additions & 0 deletions test_fixtures/vision/flickr30k/sentences/2.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[/EN#221796/people A girl] with [/EN#221804/bodyparts brown hair] sits on [/EN#221799/scene the edge of a concrete area] [/EN#221798/scene overlooking water] .
[/EN#221796/people A woman] in [/EN#221797/clothing black] , seen from [/EN#221800/other behind] , sits by [/EN#221798/scene a body of water] .
[/EN#221796/people A girl] sitting outside on [/EN#221799/other cement] near [/EN#221798/scene water] in [/EN#221797/clothing a black dress] .
[/EN#221796/people A small girl] sits on [/EN#221799/other an edge] by [/EN#221798/scene the water] contemplating [/EN#221802/other life] .
[/EN#221796/people A dark-haired girl] is sitting next to [/EN#221798/scene the waters edge] .
5 changes: 5 additions & 0 deletions test_fixtures/vision/flickr30k/sentences/3.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[/EN#221796/people A girl] without [/EN#221804/bodyparts brown hair] sits on [/EN#221799/scene the edge of a cement area] [/EN#221798/scene overlooking water] .
[/EN#221796/people A woman] wearing [/EN#221797/clothing black] , seen from [/EN#221800/other behind] , sits next to [/EN#221798/scene a body of water] .
[/EN#221796/people A girl] sitting inside on [/EN#221799/other concrete] near [/EN#221798/scene water] in [/EN#221797/clothing a black dress] .
[/EN#221796/people A small girl] sits on top of [/EN#221799/other a ledge] by [/EN#221798/scene the water] contemplating [/EN#221802/other life] .
[/EN#221796/people A dark-haired girl] is sitting by [/EN#221798/scene the waters edge] .
5 changes: 5 additions & 0 deletions test_fixtures/vision/flickr30k/sentences/4945942737.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[/EN#221796/people A girl] with [/EN#221804/bodyparts brown hair] sits on [/EN#221799/scene the edge of a cement area] [/EN#221798/scene overlooking water] .
[/EN#221796/people A woman] in [/EN#221797/clothing black] , seen from [/EN#221800/other behind] , sits next to [/EN#221798/scene a body of water] .
[/EN#221796/people A girl] sitting outside on [/EN#221799/other concrete] near [/EN#221798/scene water] in [/EN#221797/clothing a black dress] .
[/EN#221796/people A small girl] sits on [/EN#221799/other a ledge] by [/EN#221798/scene the water] contemplating [/EN#221802/other life] .
[/EN#221796/people A dark-haired girl] is sitting on [/EN#221798/scene the waters edge] .
5 changes: 5 additions & 0 deletions test_fixtures/vision/flickr30k/sentences/6338542128.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
On [/EN#253080/scene a sunny , dry day] , wearing [/EN#253081/other full football gear] , [/EN#253069/people a Texas A&M football player] tries to reach [/EN#253070/people an Iowa State football player] , for [/EN#253072/other the football] during [/EN#253078/other the game] .
[/EN#253070/people An offensive player] running with [/EN#253077/other a football] while [/EN#253069/people a football player] tries to stop [/EN#0/notvisual him] during [/EN#253071/other a football game] .
[/EN#253069/people A football player] from [/EN#253074/scene Iowa State blocks] [/EN#253069/people a player] from [/EN#253075/other Texas A&M] from taking [/EN#253072/other the football] from [/EN#0/notvisual him] .
[/EN#253070/scene The Iowa State football player blocks] [/EN#253068/people a Texas A&M defenseman] while running with [/EN#253072/other the ball] .
[/EN#253073/other # 8] for [/EN#253083/bodyparts Iowa State stiff arms] [/EN#253069/people a Texas AM player] attempting to tackle [/EN#0/notvisual him] .
5 changes: 5 additions & 0 deletions test_fixtures/vision/flickr30k/test.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
6338542128
4945942737
1
2
3
Loading

0 comments on commit e47da99

Please sign in to comment.