Skip to content

Latest commit

 

History

History
55 lines (44 loc) · 5.17 KB

KALM.md

File metadata and controls

55 lines (44 loc) · 5.17 KB

Knowledge-Augmented Language Model and Its Application to Unsupervised Named-Entity Recognition

Angli Liu, Jingfei Du, Veselin Stoyanov

Facebook AI

NAACL 2019

Whats New This paper presents augmentation method in generative langugage model, which learns latent entity types, and leverages external knowledge bases in unsupervised fashion.

How it works

  • In RNN setup, to predict the next word,

    • It needs to predict the probability of next word, \begin{aligned}
P\left(y_{t+1} \mid c_{t}\right)=& \sum_{j=0}^{K} P\left(y_{t+1}, \tau_{t+1}=j \mid c_{t}\right) \\
=& \sum_{j=0}^{K} P\left(y_{t+1} \mid \tau_{t+1}=j, c_{t}\right)
\cdot P\left(\tau_{t+1}=j \mid c_{t}\right)
\end{aligned}
    • Which is probability of word given type multiplied by probability of type given hidden state. Ofcourse the weighted sum across all the types.

    \begin{aligned}
&P\left(y_{t+1}=i \mid \tau_{t+1}=j, c_{t}\right)=\frac{\exp \left(\boldsymbol{W}_{i,:}^{p, j} \cdot \boldsymbol{h}_{t}\right)}{\sum_{w=1}^{\left|V_{j}\right|} \exp \left(\boldsymbol{W}_{w,:}^{p, j} \cdot \boldsymbol{h}_{t}\right)}\\
&P\left(\tau_{t+1}=j \mid c_{t}\right)=\frac{\exp \left(\boldsymbol{W}_{j,:}^{e} \cdot\left(\boldsymbol{W}^{h} \cdot \boldsymbol{h}_{t}\right)\right)}{\sum_{k=0}^{K} \exp \left(\boldsymbol{W}_{k,:}^{e} \cdot\left(\boldsymbol{W}^{h} \cdot \boldsymbol{h}_{t}\right)\right)}
\end{aligned}

    • Probability weighted type embedding vector will be concatenated to generated word, as it would carry forward type information in it, which will help to predict type of the next word as well.

      \begin{array}{c}
\boldsymbol{\nu}_{t+1}=\sum_{j=0}^{K} P\left(\tau_{t+1}=j \mid c_{t}\right) \cdot \boldsymbol{W}_{j,:}^{e} \\
\tilde{\boldsymbol{y}}_{t+1}=\left[\boldsymbol{y}_{t+1} ; \boldsymbol{\nu}_{t+1}\right]
\end{array}

    • Following figure demonstrate it really well

    Source: Author

    • After that it adds three more techniques:

      • for NER tasks it leverages bidirectional context.
      • It also leverage type prior for each entity from external source
      • It uses wikitext-2 data alogn with CoNLL dataset.
    • Loss function trained as below:

      \begin{aligned}
L=& H\left(P\left(y_{i} \mid c_{l}, c_{r}\right), P\left(\hat{y}_{i} \mid c_{l}, c_{r}\right)\right) \\
&+\lambda \cdot\left\|K L\left(P\left(\tau_{i} \mid c_{l}, c_{r}\right), P\left(\tau_{i} \mid y_{i}\right)\right)\right\|^{2}
\end{aligned}

Results * In unsupervised way, it gives F1 score of 0.86 for entity types LOC, MISC, ORG, PER in CoNLL * As external knowledge base gets corrupt, its NER perfromance get impacted, as can be seen below:

Source: Author