TF generate refactor - Beam Search #16374

gante · 2022-03-23T21:29:06Z

What does this PR do?

As discussed in the original TF generate refactor plan (#15562), adds beam_search.

This Beam Search implementation was inspired by our FLAX implementation, which is XLA-friendly. However, this PR is not yet XLA-ready (😭). To pass existing tests, a few tweaks were added on top of the FLAX adaptation -- I added some comments in the PR to explain the differences (and why they were needed), hopefully making the review process easier.

Tests ran (and passing):

HuggingFaceDocBuilderDev · 2022-03-23T21:43:10Z

The documentation is not available anymore as the PR was closed or merged.

gante · 2022-03-25T11:04:12Z

src/transformers/generation_tf_logits_process.py

@@ -259,8 +255,8 @@ def _create_score_penalties(self, input_ids, logits):
 np.put(token_penalties[i], prev_input_id, logit_penalties)
 return tf.convert_to_tensor(token_penalties, dtype=tf.float32)

- def __call__(self, input_ids: tf.Tensor, scores: tf.Tensor) -> tf.Tensor:
- score_penalties = self._create_score_penalties(input_ids, scores)
+ def __call__(self, input_ids: tf.Tensor, scores: tf.Tensor, cur_len: int) -> tf.Tensor:


XLA greedy search was probably missing this as well in the logits processors, since it has the same padded input_ids

Yes thanks for adding it! Just note that I don't really think we can make this processor XLA compilable anyways as it's very complex and numpy can't be used in XLA. cur_len is mostly added in Flax/JAX to make the rprocessors XLA-compilable. But doesn't hurt to added it here!

tf.unique is not compatible with XLA because the output shape is dependent on the specific input data, and so cannot be inferred at compile time. However, there should be a way to make this logit processor XLA-compilable - there's probably some solution where you store counts in a sparse matrix and then use triu() or tril() followed by a matmul to see if a token has been preceded by the same token. Let me know if you want me to try that (here or in a separate PR)

🧠
I'd leave it to a subsequent PR, XLA-readiness is not the main priority here and this PR is already very long

src/transformers/generation_tf_utils.py

src/transformers/generation_tf_logits_process.py

patrickvonplaten

This looks very nice! Great job so far - it's really not an easy PR.

Hope we can hunt down those final differences for the other models and have identical results between this and the previous version.

IMO the most important tests that need to pass here are the slow, long batched generation tests in TFT5 and TFBART. Once those tests pass I think we can be confident that it works.

Some final minor differences in TFBart might come from things like the length_penalty being slightly differently applied in the old version. E.g. if the hypothesis length is a bit different here:

transformers/src/transformers/generation_tf_utils.py

Line 2433 in 867f395

score = sum_logprobs / len(hyp) ** self.length_penalty

.

Note that we don't have 1-to-1 the same output in JAX as we do in TF here I think either, but it'd be important to match the new TF version exactly to the old TF version.

sgugger

Thanks for working on this!

src/transformers/generation_tf_utils.py

gante · 2022-04-01T19:56:55Z

src/transformers/generation_tf_utils.py

patrickvonplaten

Great job!

The PR looks good to merge for me more or less!

I'd advocate to change two final things:

1. apply all logits processor the same way (as we do in PyTorch) at the expense of a negligible super-edge case where the repetition_penalty could lead to different results. No one uses (or should use) repetition penalty in beam search, topk yields in 99% of the cases the same result
1. Add logits processors for forced_bos_token_id and forced_eos_token_id and maybe adapt Marian's adapt_logits_processor function sighly to fit the one of PyTorch.

patrickvonplaten · 2022-04-06T13:56:14Z

src/transformers/generation_tf_logits_process.py

+ # sets the score to 0 in the eos_token_id column
+ scores = tf.zeros((batch_size, 1))
+ # sets the score to -inf everywhere else
+ if self.eos_token_id > 0:


(nit) think it'd be cleaner to raise a ValueError if eos_token_id <= 0 in __init__() . This should never be the case really. But maybe let's leave it for a follow-up PR

patrickvonplaten · 2022-04-06T13:56:24Z

src/transformers/generation_tf_logits_process.py

+ # sets the score to 0 in the bos_token_id column
+ scores = tf.zeros((batch_size, 1))
+ # sets the score to -inf everywhere else
+ if self.bos_token_id > 0:


same comment as for EOS

patrickvonplaten · 2022-04-06T13:56:47Z

src/transformers/generation_tf_utils.py


+ accepts_attention_mask = "attention_mask" in set(inspect.signature(self.call).parameters.keys())


patrickvonplaten

Great work good to merge for me!

gante marked this pull request as ready for review March 25, 2022 10:50

gante changed the title ~~TF -- Beam Search refactor~~ TF generate refactor - Beam Search Mar 25, 2022

gante commented Mar 25, 2022

View reviewed changes

src/transformers/generation_tf_utils.py Show resolved Hide resolved

gante commented Mar 25, 2022

View reviewed changes

src/transformers/generation_tf_utils.py Show resolved Hide resolved

gante commented Mar 25, 2022

View reviewed changes

src/transformers/generation_tf_utils.py Outdated Show resolved Hide resolved

gante commented Mar 25, 2022

View reviewed changes

src/transformers/generation_tf_utils.py Show resolved Hide resolved

gante commented Mar 25, 2022

View reviewed changes

src/transformers/generation_tf_utils.py Show resolved Hide resolved

gante commented Mar 25, 2022

View reviewed changes

src/transformers/generation_tf_utils.py Show resolved Hide resolved

gante commented Mar 25, 2022

View reviewed changes

src/transformers/generation_tf_utils.py Outdated Show resolved Hide resolved

gante commented Mar 25, 2022

View reviewed changes

src/transformers/generation_tf_utils.py Show resolved Hide resolved

gante commented Mar 25, 2022

View reviewed changes

src/transformers/generation_tf_utils.py Show resolved Hide resolved

gante commented Mar 25, 2022

View reviewed changes

src/transformers/generation_tf_utils.py Show resolved Hide resolved

gante requested review from patrickvonplaten, sgugger and Rocketknight1 March 25, 2022 11:36

Rocketknight1 reviewed Mar 25, 2022

View reviewed changes

src/transformers/generation_tf_utils.py Outdated Show resolved Hide resolved

patrickvonplaten reviewed Mar 25, 2022

View reviewed changes

src/transformers/generation_tf_logits_process.py Show resolved Hide resolved

patrickvonplaten reviewed Mar 25, 2022

View reviewed changes

sgugger approved these changes Mar 25, 2022

View reviewed changes

gante commented Apr 1, 2022

View reviewed changes

src/transformers/generation_tf_utils.py Show resolved Hide resolved

patrickvonplaten reviewed Apr 4, 2022

View reviewed changes