Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] I used the coco dataset to reproduce rtmpose, and the acc_pose value has been hovering between 0.4 and 0.5, and has been trained for 120 epochs,How should I solve this bug? #3136

Open
2 tasks done
goalinshi opened this issue Oct 13, 2024 · 0 comments

Comments

@goalinshi
Copy link

Prerequisite

Environment

OrderedDict([('sys.platform', 'linux'), ('Python', '3.8.20 (default, Oct 3 2024, 15:24:27) [GCC 11.2.0]'), ('CUDA available', True), ('MUSA available', False), ('numpy_random_seed', 2147483648), ('GPU 0', 'NVIDIA GeForce RTX 3090'), ('CUDA_HOME', '/usr/local/cuda-11.8'), ('NVCC', 'Cuda compilation tools, release 11.8, V11.8.89'), ('GCC', 'gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0'), ('PyTorch', '2.0.1+cu117'), ('PyTorch compiling details', 'PyTorch built with:\n - GCC 9.3\n - C++ Version: 201703\n - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications\n - Intel(R) MKL-DNN v2.7.3 (Git Hash 6dbeffbae1f23cbbeae17adb7b5b13f1f37c080e)\n - OpenMP 201511 (a.k.a. OpenMP 4.5)\n - LAPACK is enabled (usually provided by MKL)\n - NNPACK is enabled\n - CPU capability usage: AVX2\n - CUDA Runtime 11.7\n - NVCC architecture flags: -gencode;arch=compute_37,code=sm_37;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86\n - CuDNN 8.5\n - Magma 2.6.1\n - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=11.7, CUDNN_VERSION=8.5.0, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -Wno-deprecated -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wunused-local-typedefs -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-error=deprecated-declarations -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=redundant-decls -Wno-error=old-style-cast -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.0.1, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, \n'), ('TorchVision', '0.15.2+cu117'), ('OpenCV', '4.10.0'), ('MMEngine', '0.10.5'), ('MMPose', '1.1.0+')])

Reproduces the problem - code sample

0.210927 loss_kpt: 0.210927 acc_pose: 0.470607
10/13 10:00:56 - mmengine - INFO - Epoch(train) [105][4200/4853] base_lr: 4.000000e-03 lr: 4.000000e-03 eta: 2 days, 3:31:06 time: 0.126481 data_time: 0.023456 memory: 3826 loss: 0.207607 loss_kpt: 0.207607 acc_pose: 0.455742
10/13 10:01:03 - mmengine - INFO - Epoch(train) [105][4250/4853] base_lr: 4.000000e-03 lr: 4.000000e-03 eta: 2 days, 3:31:00 time: 0.122026 data_time: 0.019312 memory: 3826 loss: 0.207197 loss_kpt: 0.207197 acc_pose: 0.522144
10/13 10:01:07 - mmengine - INFO - Exp name: rtmpose-l_8xb256-420e_coco-256x192_20241012_164653
10/13 10:01:09 - mmengine - INFO - Epoch(train) [105][4300/4853] base_lr: 4.000000e-03 lr: 4.000000e-03 eta: 2 days, 3:30:54 time: 0.121489 data_time: 0.018492 memory: 3826 loss: 0.210692 loss_kpt: 0.210692 acc_pose: 0.520829
10/13 10:01:15 - mmengine - INFO - Epoch(train) [105][4350/4853] base_lr: 4.000000e-03 lr: 4.000000e-03 eta: 2 days, 3:30:49 time: 0.130391 data_time: 0.027746 memory: 3826 loss: 0.207093 loss_kpt: 0.207093 acc_pose: 0.510383
10/13 10:01:21 - mmengine - INFO - Epoch(train) [105][4400/4853] base_lr: 4.000000e-03 lr: 4.000000e-03 eta: 2 days, 3:30:43 time: 0.121266 data_time: 0.018557 memory: 3826 loss: 0.208687 loss_kpt: 0.208687 acc_pose: 0.571073
10/13 10:01:27 - mmengine - INFO - Epoch(train) [105][4450/4853] base_lr: 4.000000e-03 lr: 4.000000e-03 eta: 2 days, 3:30:37 time: 0.120966 data_time: 0.018265 memory: 3826 loss: 0.207345 loss_kpt: 0.207345 acc_pose: 0.523733

Reproduces the problem - command or script

python train.py config configs/body_2d_keypoint/rtmpose/coco/rtmpose-l_8xb256-420e_coco-256x192.py
--resume work_dirs/cspnext-l_udp-aic-coco_210e-256x192-273b7631_20230130.pth

Reproduces the problem - error message

[4250/4853] base_lr: 4.000000e-03 lr: 4.000000e-03 eta: 2 days, 3:31:00 time: 0.122026 data_time: 0.019312 memory: 3826 loss: 0.207197 loss_kpt: 0.207197 acc_pose: 0.522144

Additional information

1.The dataset is based on the original COCO dataset with 2000 additional images.
2.I think the performance after adding data is close to the original given model;
3.I can't think of where the problem is. The data has been verified and there is no problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant