Skip to content

Latest commit

 

History

History
41 lines (28 loc) · 1.92 KB

spanBERT.md

File metadata and controls

41 lines (28 loc) · 1.92 KB

SpanBERT: Improving Pre-training by Representing and Predicting Spans

Mandar Joshi, Danqi Chen, Yinhan Liu, Daniel S. Weld, Luke Zettlemoyer, Omer Levy

Transactions of the Association for Computational Linguistics, vol. 8, pp. 64–77, 2020

Major Contributions:

  • Masking Random Contienous Spans:

    • Continue till 15% tokens are maksed,
      • sample a span length from geometric distribution
      • randomly select starting point of the span to be masked
  • Span Boundary Objective (SBO):

    • To predict entire masked span from observed tokens at its boundary
    • Encourage model to store span level information at its boundary, which is easily accessible during fine tuning.
    • It replaces NSP objective
    • Given masked tokens , it represent each token x_i in the span using the output encodings of external boundary tokens x_s-1, x_e+1, and positional encoding. So,

    • SBO function is implemented as 2-layer feedforward network with GELU activations and layer normalization.

    • Cross entropy loss is computed for predicted token in span, exactly like MLM.

  • Single Sequence BERT:

    • Pre training on single segments, instead of two half-length segments with NSP.
    • Do not use NSP objective
  • Objective Function:

    • SpanBERT sums the loss from both SBO and MLM.

  • Example

    • An example illustrating how SpanBERT works,

Source: Author