Source code for paper: [Risks of misinterpretation in the evaluation of Distant Supervision for Relation Extraction.] (Accepted in SEPLN). The code is based on that developed by Vashishth et. al. (2018). Includes implementation of RESIDE, PCNN, PCNN+ATT, CNN, CNN+ATT, and BGWA models.
- Compatible with TensorFlow 1.x and Python 3.6
- Dependencies can be installed using
requirements.txt
.
-
We use Riedel NYT for train models.
-
We built an test partition by selecting 324 instances of partition Riedel NYT's test partition. These instances have two labels, the ones generated automatically (original labels) and the other generated by manual revision.
-
We built an test partition based on Riedel NYT's test partition with all sentences where the relation is not NA. These instances have two labels, the ones generated automatically (original labels) and the other generated by manual revision with Amazon Mechanical Turk.
The structure of the processed input data is as follows.
{
"voc2id": {"w1": 0, "w2": 1, ...},
"type2id": {"type1": 0, "type2": 1 ...},
"rel2id": {"NA": 0, "/location/neighborhood/neighborhood_of": 1, ...}
"max_pos": 123,
"train": [
{
"X": [[s1_w1, s1_w2, ...], [s2_w1, s2_w2, ...], ...],
"Y": [bag_label],
"Pos1": [[s1_p1_1, sent1_p1_2, ...], [s2_p1_1, s2_p1_2, ...], ...],
"Pos2": [[s1_p2_1, sent1_p2_2, ...], [s2_p2_1, s2_p2_2, ...], ...],
"SubPos": [s1_sub, s2_sub, ...],
"ObjPos": [s1_obj, s2_obj, ...],
"SubType": [s1_subType, s2_subType, ...],
"ObjType": [s1_objType, s2_objType, ...],
"ProbY": [[s1_rel_alias1, s1_rel_alias2, ...], [s2_rel_alias1, ... ], ...]
"DepEdges": [[s1_dep_edges], [s2_dep_edges] ...]
},
{}, ...
],
"test": { same as "train"},
"valid": { same as "train"},
}
voc2id
is the mapping of word to its idtype2id
is the maping of entity type to its id.rel2id
is the mapping of relation to its id.max_pos
is the maximum position to consider for positional embeddings.- Each entry of
train
,test
andvalid
is a bag of sentences, whereX
denotes the sentences in bag as the list of list of word indices.Y
is the relation expressed by the sentences in the bag.Pos1
andPos2
are position of each word in sentences wrt to target entity 1 and entity 2.SubPos
andObjPos
contains the position of the target entity 1 and entity 2 in each sentence.SubType
andObjType
contains the target entity 1 and entity 2 type information obtained from KG.ProbY
is the relation alias side information (refer paper) for the bag.DepEdges
is the edgelist of dependency parse for each sentence (required for GCN).
- Execute
setup.sh
for downloading GloVe embeddings. - Download the dataset and copy into data directory.
- For training RESIDE run:
python reside.py -data ./data/riedel_train.pkl -name my_name
- For training BGWA run:
python bgwa.py -data ./data/riedel_train.pkl -name my_name
- For training PCNN run:
python pcnnatt.py -data ./data/riedel_train.pkl -name my_name -attn python pcnnatt.py -data ./data/riedel_train.pkl -name my_name
- For training CNN run:
python cnnatt.py -data ./data/riedel_train.pkl -name my_name -attn python cnnatt.py -data ./data/riedel_train.pkl -name my_name
-
For test RESIDE with heuristic and manual labels run:
python reside.py -data ./data/riedel_test_labeled_manually.pkl -name my_name -restore -only_eval (Manual labels) python reside.py -data ./data/riedel_test_labeled_heuristic.pkl -name my_name -restore -only_eval -original (Heuristic labels)
-
For test BGWA with heuristic and manual labels run:
python bgwa.py -data ./data/riedel_test_labeled_manually.pkl -name my_name -restore -only_eval (Manual labels) python bgwa.py -data ./data/riedel_test_labeled_heuristic.pkl -name my_name -restore -only_eval -original (Heuristic labels)
-
For test PCNN with heuristic and manual labels run:
python pcnnatt.py -data ./data/riedel_test_labeled_manually.pkl -name my_name -restore -only_eval -attn (Manual labels) python pcnnatt.py -data ./data/riedel_test_labeled_heuristic.pkl -name my_name -restore -only_eval -original -attn (Heuristic labels) python pcnnatt.py -data ./data/riedel_test_labeled_manually.pkl -name my_name -restore -only_eval (Manual labels) python pcnnatt.py -data ./data/riedel_test_labeled_heuristic.pkl -name my_name -restore -only_eval -original (Heuristic labels)
-
For test CNN with heuristic and manual labels run:
python cnnatt.py -data ./data/riedel_test_labeled_manually.pkl -name my_name -restore -only_eval -attn (Manual labels) python cnnatt.py -data ./data/riedel_test_labeled_heuristic.pkl -name my_name -restore -only_eval -original -attn (Heuristic labels) python cnnatt.py -data ./data/riedel_test_labeled_manually.pkl -name my_name -restore -only_eval (Manual labels) python cnnatt.py -data ./data/riedel_test_labeled_heuristic.pkl -name my_name -restore -only_eval -original (Heuristic labels)
python auc_heuristic_manual_labels.py
Please check the names of the trained models. For any clarification, comments, or suggestions please create an issue or contact [email protected].