Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why the training was interrupted unexpectedly after some iteration(60000)? #102

Open
shadowuyl opened this issue Jan 24, 2018 · 1 comment

Comments

@shadowuyl
Copy link

I trained my data with ResNet-50, but it was interrupted unexpectedly(when i train my data with small iteration, such as 500, it was ok. ). I don't know what is the reason. Anyone could give me some advice. Thank you very much. This is the output:

27478 speed: 0.765s / iter
27479 I0124 10:34:33.386742 66910 solver.cpp:228] Iteration 63600, loss = 0.0277704
27480 I0124 10:34:33.386772 66910 solver.cpp:244] Train net output #0: accuarcy = 1
27481 I0124 10:34:33.386780 66910 solver.cpp:244] Train net output #1: loss_bbox = 0.0094369 (* 1 = 0.0094369 loss)
27482 I0124 10:34:33.386785 66910 solver.cpp:244] Train net output #2: loss_cls = 0.00284368 (* 1 = 0.00284368 loss)
27483 I0124 10:34:33.386790 66910 solver.cpp:244] Train net output #3: rpn_cls_loss = 0.000252465 (* 1 = 0.000252465 loss)
27484 I0124 10:34:33.386792 66910 solver.cpp:244] Train net output #4: rpn_loss_bbox = 0.000642761 (* 1 = 0.000642761 loss)
27485 I0124 10:34:33.386797 66910 sgd_solver.cpp:106] Iteration 63600, lr = 0.001
27486 ./experiments/scripts/rfcn_end2end_ohem.sh: line 58: 66910 Killed ./tools/train_net.py --gpu ${GPU_ID} --solver models/${PT_DIR}/${NET}/rfcn_end2end/solver_ohem.prototxt --weights data/imagenet_models/${NET}-model. caffemodel --imdb ${TRAIN_IMDB} --iters ${ITERS} --cfg experiments/cfgs/rfcn_end2end_ohem.yml ${EXTRA_ARGS}

@foralliance
Copy link

@shadowuyl
It should be the lack of memory

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants