-
Notifications
You must be signed in to change notification settings - Fork 0
/
pytorch.log
186 lines (186 loc) · 15.9 KB
/
pytorch.log
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
Loaded parameters from params2.npz
| epoch 1 | 200/ 2323 batches | lr 0.37 | ms/batch 21.67 | loss 6.96 | ppl 1055.21
| epoch 1 | 400/ 2323 batches | lr 0.37 | ms/batch 21.00 | loss 6.45 | ppl 634.39
| epoch 1 | 600/ 2323 batches | lr 0.37 | ms/batch 20.99 | loss 6.16 | ppl 473.43
| epoch 1 | 800/ 2323 batches | lr 0.37 | ms/batch 21.00 | loss 6.03 | ppl 414.93
| epoch 1 | 1000/ 2323 batches | lr 0.37 | ms/batch 21.00 | loss 5.87 | ppl 352.93
| epoch 1 | 1200/ 2323 batches | lr 0.37 | ms/batch 21.00 | loss 5.76 | ppl 318.82
| epoch 1 | 1400/ 2323 batches | lr 0.37 | ms/batch 21.00 | loss 5.57 | ppl 263.19
| epoch 1 | 1600/ 2323 batches | lr 0.37 | ms/batch 21.00 | loss 5.52 | ppl 248.40
| epoch 1 | 1800/ 2323 batches | lr 0.37 | ms/batch 21.05 | loss 5.51 | ppl 246.54
| epoch 1 | 2000/ 2323 batches | lr 0.37 | ms/batch 21.07 | loss 5.37 | ppl 215.88
| epoch 1 | 2200/ 2323 batches | lr 0.37 | ms/batch 21.07 | loss 5.26 | ppl 191.69
-----------------------------------------------------------------------------------------
| end of epoch 1 | time: 50.18s | valid loss 5.38 | valid ppl 216.94
-----------------------------------------------------------------------------------------
| epoch 2 | 200/ 2323 batches | lr 0.37 | ms/batch 21.18 | loss 5.26 | ppl 193.23
| epoch 2 | 400/ 2323 batches | lr 0.37 | ms/batch 21.07 | loss 5.28 | ppl 195.79
| epoch 2 | 600/ 2323 batches | lr 0.37 | ms/batch 21.07 | loss 5.20 | ppl 180.39
| epoch 2 | 800/ 2323 batches | lr 0.37 | ms/batch 21.07 | loss 5.18 | ppl 178.12
| epoch 2 | 1000/ 2323 batches | lr 0.37 | ms/batch 21.07 | loss 5.16 | ppl 173.46
| epoch 2 | 1200/ 2323 batches | lr 0.37 | ms/batch 21.07 | loss 5.13 | ppl 168.23
| epoch 2 | 1400/ 2323 batches | lr 0.37 | ms/batch 21.07 | loss 5.00 | ppl 148.59
| epoch 2 | 1600/ 2323 batches | lr 0.37 | ms/batch 21.07 | loss 5.02 | ppl 151.09
| epoch 2 | 1800/ 2323 batches | lr 0.37 | ms/batch 21.07 | loss 5.06 | ppl 158.12
| epoch 2 | 2000/ 2323 batches | lr 0.37 | ms/batch 21.07 | loss 4.91 | ppl 135.71
| epoch 2 | 2200/ 2323 batches | lr 0.37 | ms/batch 21.07 | loss 4.84 | ppl 126.41
-----------------------------------------------------------------------------------------
| end of epoch 2 | time: 50.19s | valid loss 5.08 | valid ppl 161.08
-----------------------------------------------------------------------------------------
| epoch 3 | 200/ 2323 batches | lr 0.37 | ms/batch 21.18 | loss 4.87 | ppl 130.32
| epoch 3 | 400/ 2323 batches | lr 0.37 | ms/batch 21.08 | loss 4.95 | ppl 140.77
| epoch 3 | 600/ 2323 batches | lr 0.37 | ms/batch 21.08 | loss 4.86 | ppl 129.05
| epoch 3 | 800/ 2323 batches | lr 0.37 | ms/batch 21.07 | loss 4.85 | ppl 128.14
| epoch 3 | 1000/ 2323 batches | lr 0.37 | ms/batch 21.08 | loss 4.86 | ppl 128.51
| epoch 3 | 1200/ 2323 batches | lr 0.37 | ms/batch 21.08 | loss 4.83 | ppl 125.55
| epoch 3 | 1400/ 2323 batches | lr 0.37 | ms/batch 21.08 | loss 4.72 | ppl 112.24
| epoch 3 | 1600/ 2323 batches | lr 0.37 | ms/batch 21.08 | loss 4.76 | ppl 117.14
| epoch 3 | 1800/ 2323 batches | lr 0.37 | ms/batch 21.08 | loss 4.83 | ppl 125.44
| epoch 3 | 2000/ 2323 batches | lr 0.37 | ms/batch 21.08 | loss 4.65 | ppl 104.47
| epoch 3 | 2200/ 2323 batches | lr 0.37 | ms/batch 21.08 | loss 4.60 | ppl 99.45
-----------------------------------------------------------------------------------------
| end of epoch 3 | time: 50.21s | valid loss 4.95 | valid ppl 141.87
-----------------------------------------------------------------------------------------
| epoch 4 | 200/ 2323 batches | lr 0.37 | ms/batch 21.19 | loss 4.63 | ppl 102.87
| epoch 4 | 400/ 2323 batches | lr 0.37 | ms/batch 21.08 | loss 4.74 | ppl 114.53
| epoch 4 | 600/ 2323 batches | lr 0.37 | ms/batch 21.08 | loss 4.65 | ppl 104.46
| epoch 4 | 800/ 2323 batches | lr 0.37 | ms/batch 21.08 | loss 4.64 | ppl 103.32
| epoch 4 | 1000/ 2323 batches | lr 0.37 | ms/batch 21.08 | loss 4.66 | ppl 105.67
| epoch 4 | 1200/ 2323 batches | lr 0.37 | ms/batch 21.08 | loss 4.64 | ppl 103.13
| epoch 4 | 1400/ 2323 batches | lr 0.37 | ms/batch 21.08 | loss 4.53 | ppl 92.91
| epoch 4 | 1600/ 2323 batches | lr 0.37 | ms/batch 21.08 | loss 4.59 | ppl 98.25
| epoch 4 | 1800/ 2323 batches | lr 0.37 | ms/batch 21.08 | loss 4.67 | ppl 106.86
| epoch 4 | 2000/ 2323 batches | lr 0.37 | ms/batch 21.08 | loss 4.47 | ppl 87.33
| epoch 4 | 2200/ 2323 batches | lr 0.37 | ms/batch 21.08 | loss 4.43 | ppl 84.27
-----------------------------------------------------------------------------------------
| end of epoch 4 | time: 50.20s | valid loss 4.90 | valid ppl 134.33
-----------------------------------------------------------------------------------------
| epoch 5 | 200/ 2323 batches | lr 0.18 | ms/batch 21.19 | loss 4.45 | ppl 85.98
| epoch 5 | 400/ 2323 batches | lr 0.18 | ms/batch 21.08 | loss 4.55 | ppl 94.86
| epoch 5 | 600/ 2323 batches | lr 0.18 | ms/batch 21.08 | loss 4.44 | ppl 84.73
| epoch 5 | 800/ 2323 batches | lr 0.18 | ms/batch 21.08 | loss 4.41 | ppl 82.65
| epoch 5 | 1000/ 2323 batches | lr 0.18 | ms/batch 21.08 | loss 4.43 | ppl 84.19
| epoch 5 | 1200/ 2323 batches | lr 0.18 | ms/batch 21.08 | loss 4.40 | ppl 81.37
| epoch 5 | 1400/ 2323 batches | lr 0.18 | ms/batch 21.08 | loss 4.29 | ppl 72.84
| epoch 5 | 1600/ 2323 batches | lr 0.18 | ms/batch 21.08 | loss 4.34 | ppl 76.76
| epoch 5 | 1800/ 2323 batches | lr 0.18 | ms/batch 21.09 | loss 4.43 | ppl 83.62
| epoch 5 | 2000/ 2323 batches | lr 0.18 | ms/batch 21.08 | loss 4.19 | ppl 66.33
| epoch 5 | 2200/ 2323 batches | lr 0.18 | ms/batch 21.09 | loss 4.16 | ppl 64.28
-----------------------------------------------------------------------------------------
| end of epoch 5 | time: 50.22s | valid loss 4.83 | valid ppl 125.53
-----------------------------------------------------------------------------------------
| epoch 6 | 200/ 2323 batches | lr 0.09 | ms/batch 21.20 | loss 4.31 | ppl 74.57
| epoch 6 | 400/ 2323 batches | lr 0.09 | ms/batch 21.09 | loss 4.42 | ppl 82.80
| epoch 6 | 600/ 2323 batches | lr 0.09 | ms/batch 21.09 | loss 4.30 | ppl 73.72
| epoch 6 | 800/ 2323 batches | lr 0.09 | ms/batch 21.08 | loss 4.27 | ppl 71.59
| epoch 6 | 1000/ 2323 batches | lr 0.09 | ms/batch 21.09 | loss 4.29 | ppl 72.78
| epoch 6 | 1200/ 2323 batches | lr 0.09 | ms/batch 21.08 | loss 4.25 | ppl 70.07
| epoch 6 | 1400/ 2323 batches | lr 0.09 | ms/batch 21.09 | loss 4.14 | ppl 62.55
| epoch 6 | 1600/ 2323 batches | lr 0.09 | ms/batch 21.09 | loss 4.19 | ppl 65.81
| epoch 6 | 1800/ 2323 batches | lr 0.09 | ms/batch 21.09 | loss 4.27 | ppl 71.67
| epoch 6 | 2000/ 2323 batches | lr 0.09 | ms/batch 21.09 | loss 4.03 | ppl 56.16
| epoch 6 | 2200/ 2323 batches | lr 0.09 | ms/batch 21.09 | loss 3.99 | ppl 54.12
-----------------------------------------------------------------------------------------
| end of epoch 6 | time: 50.22s | valid loss 4.81 | valid ppl 122.60
-----------------------------------------------------------------------------------------
| epoch 7 | 200/ 2323 batches | lr 0.05 | ms/batch 21.20 | loss 4.22 | ppl 68.22
| epoch 7 | 400/ 2323 batches | lr 0.05 | ms/batch 21.08 | loss 4.33 | ppl 76.11
| epoch 7 | 600/ 2323 batches | lr 0.05 | ms/batch 21.09 | loss 4.22 | ppl 67.77
| epoch 7 | 800/ 2323 batches | lr 0.05 | ms/batch 21.09 | loss 4.18 | ppl 65.68
| epoch 7 | 1000/ 2323 batches | lr 0.05 | ms/batch 21.09 | loss 4.20 | ppl 66.82
| epoch 7 | 1200/ 2323 batches | lr 0.05 | ms/batch 21.08 | loss 4.16 | ppl 64.26
| epoch 7 | 1400/ 2323 batches | lr 0.05 | ms/batch 21.09 | loss 4.05 | ppl 57.22
| epoch 7 | 1600/ 2323 batches | lr 0.05 | ms/batch 21.09 | loss 4.10 | ppl 60.15
| epoch 7 | 1800/ 2323 batches | lr 0.05 | ms/batch 21.09 | loss 4.18 | ppl 65.37
| epoch 7 | 2000/ 2323 batches | lr 0.05 | ms/batch 21.09 | loss 3.93 | ppl 51.01
| epoch 7 | 2200/ 2323 batches | lr 0.05 | ms/batch 21.09 | loss 3.89 | ppl 48.95
-----------------------------------------------------------------------------------------
| end of epoch 7 | time: 50.22s | valid loss 4.79 | valid ppl 120.84
-----------------------------------------------------------------------------------------
| epoch 8 | 200/ 2323 batches | lr 0.02 | ms/batch 21.19 | loss 4.17 | ppl 64.71
| epoch 8 | 400/ 2323 batches | lr 0.02 | ms/batch 21.08 | loss 4.28 | ppl 72.45
| epoch 8 | 600/ 2323 batches | lr 0.02 | ms/batch 21.09 | loss 4.17 | ppl 64.48
| epoch 8 | 800/ 2323 batches | lr 0.02 | ms/batch 21.09 | loss 4.13 | ppl 62.48
| epoch 8 | 1000/ 2323 batches | lr 0.02 | ms/batch 21.09 | loss 4.15 | ppl 63.64
| epoch 8 | 1200/ 2323 batches | lr 0.02 | ms/batch 21.08 | loss 4.12 | ppl 61.27
| epoch 8 | 1400/ 2323 batches | lr 0.02 | ms/batch 21.08 | loss 4.00 | ppl 54.40
| epoch 8 | 1600/ 2323 batches | lr 0.02 | ms/batch 21.09 | loss 4.05 | ppl 57.16
| epoch 8 | 1800/ 2323 batches | lr 0.02 | ms/batch 21.09 | loss 4.13 | ppl 62.06
| epoch 8 | 2000/ 2323 batches | lr 0.02 | ms/batch 21.09 | loss 3.88 | ppl 48.38
| epoch 8 | 2200/ 2323 batches | lr 0.02 | ms/batch 21.09 | loss 3.83 | ppl 46.26
-----------------------------------------------------------------------------------------
| end of epoch 8 | time: 50.21s | valid loss 4.79 | valid ppl 119.77
-----------------------------------------------------------------------------------------
| epoch 9 | 200/ 2323 batches | lr 0.01 | ms/batch 21.20 | loss 4.14 | ppl 62.77
| epoch 9 | 400/ 2323 batches | lr 0.01 | ms/batch 21.08 | loss 4.26 | ppl 70.47
| epoch 9 | 600/ 2323 batches | lr 0.01 | ms/batch 21.09 | loss 4.14 | ppl 62.67
| epoch 9 | 800/ 2323 batches | lr 0.01 | ms/batch 21.09 | loss 4.11 | ppl 60.73
| epoch 9 | 1000/ 2323 batches | lr 0.01 | ms/batch 21.08 | loss 4.13 | ppl 61.94
| epoch 9 | 1200/ 2323 batches | lr 0.01 | ms/batch 21.08 | loss 4.09 | ppl 59.69
| epoch 9 | 1400/ 2323 batches | lr 0.01 | ms/batch 21.09 | loss 3.97 | ppl 52.92
| epoch 9 | 1600/ 2323 batches | lr 0.01 | ms/batch 21.08 | loss 4.02 | ppl 55.57
| epoch 9 | 1800/ 2323 batches | lr 0.01 | ms/batch 21.09 | loss 4.10 | ppl 60.31
| epoch 9 | 2000/ 2323 batches | lr 0.01 | ms/batch 21.09 | loss 3.85 | ppl 47.00
| epoch 9 | 2200/ 2323 batches | lr 0.01 | ms/batch 21.09 | loss 3.80 | ppl 44.82
-----------------------------------------------------------------------------------------
| end of epoch 9 | time: 50.22s | valid loss 4.78 | valid ppl 119.04
-----------------------------------------------------------------------------------------
| epoch 10 | 200/ 2323 batches | lr 0.01 | ms/batch 21.19 | loss 4.12 | ppl 61.71
| epoch 10 | 400/ 2323 batches | lr 0.01 | ms/batch 21.08 | loss 4.24 | ppl 69.39
| epoch 10 | 600/ 2323 batches | lr 0.01 | ms/batch 21.09 | loss 4.12 | ppl 61.67
| epoch 10 | 800/ 2323 batches | lr 0.01 | ms/batch 21.09 | loss 4.09 | ppl 59.77
| epoch 10 | 1000/ 2323 batches | lr 0.01 | ms/batch 21.09 | loss 4.11 | ppl 60.98
| epoch 10 | 1200/ 2323 batches | lr 0.01 | ms/batch 21.09 | loss 4.07 | ppl 58.79
| epoch 10 | 1400/ 2323 batches | lr 0.01 | ms/batch 21.09 | loss 3.95 | ppl 52.13
| epoch 10 | 1600/ 2323 batches | lr 0.01 | ms/batch 21.08 | loss 4.00 | ppl 54.70
| epoch 10 | 1800/ 2323 batches | lr 0.01 | ms/batch 21.09 | loss 4.08 | ppl 59.40
| epoch 10 | 2000/ 2323 batches | lr 0.01 | ms/batch 21.09 | loss 3.83 | ppl 46.22
| epoch 10 | 2200/ 2323 batches | lr 0.01 | ms/batch 21.09 | loss 3.79 | ppl 44.04
-----------------------------------------------------------------------------------------
| end of epoch 10 | time: 50.23s | valid loss 4.78 | valid ppl 118.56
-----------------------------------------------------------------------------------------
| epoch 11 | 200/ 2323 batches | lr 0.00 | ms/batch 21.20 | loss 4.11 | ppl 61.11
| epoch 11 | 400/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 4.23 | ppl 68.75
| epoch 11 | 600/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 4.11 | ppl 61.12
| epoch 11 | 800/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 4.08 | ppl 59.27
| epoch 11 | 1000/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 4.10 | ppl 60.43
| epoch 11 | 1200/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 4.06 | ppl 58.26
| epoch 11 | 1400/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 3.95 | ppl 51.68
| epoch 11 | 1600/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 3.99 | ppl 54.19
| epoch 11 | 1800/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 4.08 | ppl 58.94
| epoch 11 | 2000/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 3.82 | ppl 45.76
| epoch 11 | 2200/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 3.78 | ppl 43.62
-----------------------------------------------------------------------------------------
| end of epoch 11 | time: 50.23s | valid loss 4.77 | valid ppl 118.19
-----------------------------------------------------------------------------------------
| epoch 12 | 200/ 2323 batches | lr 0.00 | ms/batch 21.19 | loss 4.11 | ppl 60.77
| epoch 12 | 400/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 4.23 | ppl 68.38
| epoch 12 | 600/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 4.11 | ppl 60.81
| epoch 12 | 800/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 4.08 | ppl 58.99
| epoch 12 | 1000/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 4.10 | ppl 60.13
| epoch 12 | 1200/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 4.06 | ppl 57.96
| epoch 12 | 1400/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 3.94 | ppl 51.42
| epoch 12 | 1600/ 2323 batches | lr 0.00 | ms/batch 21.08 | loss 3.99 | ppl 53.90
| epoch 12 | 1800/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 4.07 | ppl 58.71
| epoch 12 | 2000/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 3.82 | ppl 45.51
| epoch 12 | 2200/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 3.77 | ppl 43.40
-----------------------------------------------------------------------------------------
| end of epoch 12 | time: 50.23s | valid loss 4.77 | valid ppl 117.88
-----------------------------------------------------------------------------------------
| epoch 13 | 200/ 2323 batches | lr 0.00 | ms/batch 21.20 | loss 4.10 | ppl 60.56
| epoch 13 | 400/ 2323 batches | lr 0.00 | ms/batch 21.08 | loss 4.22 | ppl 68.17
| epoch 13 | 600/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 4.10 | ppl 60.64
| epoch 13 | 800/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 4.08 | ppl 58.85
| epoch 13 | 1000/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 4.09 | ppl 59.98
| epoch 13 | 1200/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 4.06 | ppl 57.79
| epoch 13 | 1400/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 3.94 | ppl 51.27
| epoch 13 | 1600/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 3.98 | ppl 53.73
| epoch 13 | 1800/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 4.07 | ppl 58.58
| epoch 13 | 2000/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 3.82 | ppl 45.39
| epoch 13 | 2200/ 2323 batches | lr 0.00 | ms/batch 21.09 | loss 3.77 | ppl 43.28
-----------------------------------------------------------------------------------------
| end of epoch 13 | time: 50.23s | valid loss 4.77 | valid ppl 117.65
-----------------------------------------------------------------------------------------
=========================================================================================
| End of training | test loss 4.73 | test ppl 113.61
=========================================================================================