Preprocess.py 的 53-63行感觉有点问题 #9

MaNing1924382115 · 2021-07-27T02:54:50Z

       1  win_size = args.win_size
      2  step = args.step
      3  start_index = 0
     4   end_index = win_size
     5   data = token_ids[start_index:end_index]
      6  train_list.append(data)
      7  start_index += step
      8  end_index += step
      9  while end_index+50 < len(token_ids):  # 剩下的数据长度，大于或等于50，才加入训练数据集
          10  data = token_ids[start_index:end_index]
          11  train_list.append(data)
          12  start_index += step
          13  end_index += step

假如tokens长度621
执行完8行时， start_index =200， end_index =400， train_list保存到200
进入循环，第一次执行到13行，start_index =400， end_index =600， train_list保存到400
判断600+50 > 621 退出，train_list保存到400，400-621 被遗弃

假如tokens长度651
执行完8行时， start_index =200， end_index =400， train_list保存到200
进入循环，第一次执行到13行，start_index =400， end_index =600， train_list保存到400
第二次执行到13行，start_index =600， end_index =800， train_list保存到600
判断800+50 > 621 退出，train_list保存到600，600-651 被遗弃
你这个代码会把tokens的最后50 到step+50-1 token删除，感觉不是你说的剩下的数据长度，大于或等于50，才加入训练数据集

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Preprocess.py 的 53-63行感觉有点问题 #9

Preprocess.py 的 53-63行感觉有点问题 #9

MaNing1924382115 commented Jul 27, 2021

Preprocess.py 的 53-63行感觉有点问题 #9

Preprocess.py 的 53-63行感觉有点问题 #9

Comments

MaNing1924382115 commented Jul 27, 2021