arXiv preprint [arXiv:1904.09751] (2019).
Whats Unique This paper demonstrate an effective method, Nucleus Sampling, to generate text, which is more close to humans in terms of probability, perplexity, diversitiy and quality. It compares Nucleus sampling with other methods like Beam Search, Pure Sampling, Top-k sampling with different temperatures.
How It Works
-
Beam Search: It alyways consider top k paths, and generate one with the highest probability.
-
Pure Sampling: Next token is always sampled using the probability distribution.
-
Top-K sampling: Next token is sampled from fixed k tokens based on their probabilitiy distrubution, Note that probabilities of all other tokens (not in top-K) is truncated, and probabilities for top-k tokens are re-adjusted.
-
Top-K sampling with temperature: Temperature is divided from the logit value, on which softmax is applied. Lower temperature gives very steep probability distribution.
-
Nucleus Sampling: A minimal subset of tokens having cumulative probability greater than p is considered as nucleus. And, tokens are sampled based on its normalised probability distribution.
Analysis
- This paper brings lots of analysis and insights.
- Example of text generated by different sampling methods:
- Self-BLEU was used to evaluate how diverse are generated texts.
- HUSE: Human Unified Statistical Evaluation: Where a classifer is trained to classify if text is human generated or model generated using two features, probability and human judgement of "typicality".