forked from edaiofficial/okwugbe
-
Notifications
You must be signed in to change notification settings - Fork 0
/
plan.txt
30 lines (19 loc) · 1.4 KB
/
plan.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
-------------------------------------MMT Packaging for African languages-------------------------------------
Writing our own Colab code for MMT modelling. [Bona/Chris]
Think about how data from other languages. (Back translation with MMT) (No parallel data) [Chris/Bona]
*Trying to create MMT with JoeyNMT / Design an MMT Colab Notebook for African languages for the community.
-------------------------------------ASR Packaging for African language-------------------------------------
Characteristics: Taking care of diacritics.
Data Preprocessing
Parameters:
-path. to file that contains both train and test csv
-infer:boolean. Default=True. Whether to generate character.txt from data.
-character.txt: file for unique characters. needs to be provided if 'infer' is False
Idea:Data --> get unique alphabets/characters in whole data --> save to character.txt --> tell the person to go to charater.txt and add more if there is more
-validation_size. Default=0.2. Size of validation set.
-batch_size.
Modelling
-use_attention:boolean. Default=True. Whether to use the classifier with attention.
-use_gradient_accumulation: boolean. Default=True. Whether to use gradient accumulation. This helps
-output_dir. directory to save model output and others.
-use_last_checkpoint:boolean. Default=True. Use the last saved checkpoint when training.