Knowledge-Augmented Language Model and Its Application to Unsupervised Named-Entity Recognition

Angli Liu, Jingfei Du, Veselin Stoyanov

Facebook AI

NAACL 2019

Whats New This paper presents augmentation method in generative langugage model, which learns latent entity types, and leverages external knowledge bases in unsupervised fashion.

How it works

In RNN setup, to predict the next word,
- It needs to predict the probability of next word, $\begin{aligned} P\left(y_{t+1} \mid c_{t}\right)=& \sum_{j=0}^{K} P\left(y_{t+1}, \tau_{t+1}=j \mid c_{t}\right) \\ =& \sum_{j=0}^{K} P\left(y_{t+1} \mid \tau_{t+1}=j, c_{t}\right) \cdot P\left(\tau_{t+1}=j \mid c_{t}\right) \end{aligned}$
- Which is probability of word given type multiplied by probability of type given hidden state. Ofcourse the weighted sum across all the types.
$\begin{aligned} &P\left(y_{t+1}=i \mid \tau_{t+1}=j, c_{t}\right)=\frac{\exp \left(\boldsymbol{W}_{i,:}^{p, j} \cdot \boldsymbol{h}_{t}\right)}{\sum_{w=1}^{\left|V_{j}\right|} \exp \left(\boldsymbol{W}_{w,:}^{p, j} \cdot \boldsymbol{h}_{t}\right)}\\ &P\left(\tau_{t+1}=j \mid c_{t}\right)=\frac{\exp \left(\boldsymbol{W}_{j,:}^{e} \cdot\left(\boldsymbol{W}^{h} \cdot \boldsymbol{h}_{t}\right)\right)}{\sum_{k=0}^{K} \exp \left(\boldsymbol{W}_{k,:}^{e} \cdot\left(\boldsymbol{W}^{h} \cdot \boldsymbol{h}_{t}\right)\right)} \end{aligned}$
- Probability weighted type embedding vector will be concatenated to generated word, as it would carry forward type information in it, which will help to predict type of the next word as well.
  
  $\begin{array}{c} \boldsymbol{\nu}_{t+1}=\sum_{j=0}^{K} P\left(\tau_{t+1}=j \mid c_{t}\right) \cdot \boldsymbol{W}_{j,:}^{e} \\ \tilde{\boldsymbol{y}}_{t+1}=\left[\boldsymbol{y}_{t+1} ; \boldsymbol{\nu}_{t+1}\right] \end{array}$
- Following figure demonstrate it really well
Source: Author
- After that it adds three more techniques:
  - for NER tasks it leverages bidirectional context.
  - It also leverage type prior for each entity from external source
  - It uses wikitext-2 data alogn with CoNLL dataset.
- Loss function trained as below:
  
  $\begin{aligned} L=& H\left(P\left(y_{i} \mid c_{l}, c_{r}\right), P\left(\hat{y}_{i} \mid c_{l}, c_{r}\right)\right) \\ &+\lambda \cdot\left\|K L\left(P\left(\tau_{i} \mid c_{l}, c_{r}\right), P\left(\tau_{i} \mid y_{i}\right)\right)\right\|^{2} \end{aligned}$

Results * In unsupervised way, it gives F1 score of 0.86 for entity types LOC, MISC, ORG, PER in CoNLL * As external knowledge base gets corrupt, its NER perfromance get impacted, as can be seen below:

Source: Author

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

KALM.md

KALM.md

Knowledge-Augmented Language Model and Its Application to Unsupervised Named-Entity Recognition

Angli Liu, Jingfei Du, Veselin Stoyanov

Facebook AI

NAACL 2019

Files

KALM.md

Latest commit

History

KALM.md

File metadata and controls

Knowledge-Augmented Language Model and Its Application to Unsupervised Named-Entity Recognition

Angli Liu, Jingfei Du, Veselin Stoyanov

Facebook AI

NAACL 2019