You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
[Bug] I used the coco dataset to reproduce rtmpose, and the acc_pose value has been hovering between 0.4 and 0.5, and has been trained for 120 epochs,How should I solve this bug?
#3136
Open
2 tasks done
goalinshi opened this issue
Oct 13, 2024
· 0 comments
1.The dataset is based on the original COCO dataset with 2000 additional images.
2.I think the performance after adding data is close to the original given model;
3.I can't think of where the problem is. The data has been verified and there is no problem.
The text was updated successfully, but these errors were encountered:
Prerequisite
Environment
OrderedDict([('sys.platform', 'linux'), ('Python', '3.8.20 (default, Oct 3 2024, 15:24:27) [GCC 11.2.0]'), ('CUDA available', True), ('MUSA available', False), ('numpy_random_seed', 2147483648), ('GPU 0', 'NVIDIA GeForce RTX 3090'), ('CUDA_HOME', '/usr/local/cuda-11.8'), ('NVCC', 'Cuda compilation tools, release 11.8, V11.8.89'), ('GCC', 'gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0'), ('PyTorch', '2.0.1+cu117'), ('PyTorch compiling details', 'PyTorch built with:\n - GCC 9.3\n - C++ Version: 201703\n - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - LAPACK is enabled (usually provided by MKL)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 11.7\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86\n - CuDNN 8.5\n - Magma 2.6.1\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n'), ('TorchVision', '0.15.2+cu117'), ('OpenCV', '4.10.0'), ('MMEngine', '0.10.5'), ('MMPose', '1.1.0+')])
Reproduces the problem - code sample
0.210927 loss_kpt: 0.210927 acc_pose: 0.470607
10/13 10:00:56 - mmengine - INFO - Epoch(train) [105][4200/4853] base_lr: 4.000000e-03 lr: 4.000000e-03 eta: 2 days, 3:31:06 time: 0.126481 data_time: 0.023456 memory: 3826 loss: 0.207607 loss_kpt: 0.207607 acc_pose: 0.455742
10/13 10:01:03 - mmengine - INFO - Epoch(train) [105][4250/4853] base_lr: 4.000000e-03 lr: 4.000000e-03 eta: 2 days, 3:31:00 time: 0.122026 data_time: 0.019312 memory: 3826 loss: 0.207197 loss_kpt: 0.207197 acc_pose: 0.522144
10/13 10:01:07 - mmengine - INFO - Exp name: rtmpose-l_8xb256-420e_coco-256x192_20241012_164653
10/13 10:01:09 - mmengine - INFO - Epoch(train) [105][4300/4853] base_lr: 4.000000e-03 lr: 4.000000e-03 eta: 2 days, 3:30:54 time: 0.121489 data_time: 0.018492 memory: 3826 loss: 0.210692 loss_kpt: 0.210692 acc_pose: 0.520829
10/13 10:01:15 - mmengine - INFO - Epoch(train) [105][4350/4853] base_lr: 4.000000e-03 lr: 4.000000e-03 eta: 2 days, 3:30:49 time: 0.130391 data_time: 0.027746 memory: 3826 loss: 0.207093 loss_kpt: 0.207093 acc_pose: 0.510383
10/13 10:01:21 - mmengine - INFO - Epoch(train) [105][4400/4853] base_lr: 4.000000e-03 lr: 4.000000e-03 eta: 2 days, 3:30:43 time: 0.121266 data_time: 0.018557 memory: 3826 loss: 0.208687 loss_kpt: 0.208687 acc_pose: 0.571073
10/13 10:01:27 - mmengine - INFO - Epoch(train) [105][4450/4853] base_lr: 4.000000e-03 lr: 4.000000e-03 eta: 2 days, 3:30:37 time: 0.120966 data_time: 0.018265 memory: 3826 loss: 0.207345 loss_kpt: 0.207345 acc_pose: 0.523733
Reproduces the problem - command or script
python train.py config configs/body_2d_keypoint/rtmpose/coco/rtmpose-l_8xb256-420e_coco-256x192.py
--resume work_dirs/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth
Reproduces the problem - error message
[4250/4853] base_lr: 4.000000e-03 lr: 4.000000e-03 eta: 2 days, 3:31:00 time: 0.122026 data_time: 0.019312 memory: 3826 loss: 0.207197 loss_kpt: 0.207197 acc_pose: 0.522144
Additional information
1.The dataset is based on the original COCO dataset with 2000 additional images.
2.I think the performance after adding data is close to the original given model;
3.I can't think of where the problem is. The data has been verified and there is no problem.
The text was updated successfully, but these errors were encountered: