Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Colab mineru_demo.ipynb failing; bad /MFD/weights.pt file reference #732

Open
Analect opened this issue Oct 13, 2024 · 1 comment
Open
Labels
bug Something isn't working

Comments

@Analect
Copy link

Analect commented Oct 13, 2024

Description of the bug | 错误描述

I was trying to get this running on Colab. When running step !magic-pdf -p demo1.pdf -o output/ -m auto was getting [Errno 2] No such file or directory: '/root/.cache/huggingface/hub/models--opendatalab--PDF-Extract-Kit/snapshots/a29caa466f6d07be0e4863bba64204009128931a/MFD/weights.pt'. It seems that reference is missing a subfolder models that sits above MFD/weights.pt.

image

Also, T4 on Colab has following cuda verison.

Sun Oct 13 12:15:42 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.104.05             Driver Version: 535.104.05   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:04.0 Off |                    0 |
| N/A   40C    P8               9W /  70W |      0MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
                                                                                         
+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

For step !pip install paddlepaddle-gpu==3.0.0b1 -i https://www.paddlepaddle.org.cn/packages/stable/cu118/, what do you suggest? I notice there is no https://www.paddlepaddle.org.cn/packages/stable/cu122/.

Thanks.

How to reproduce the bug | 如何复现

Open colab demo from https://colab.research.google.com/gist/papayalove/b5f4913389e7ff9883c6b687de156e78/mineru_demo.ipynb.

Operating system | 操作系统

Linux

Python version | Python 版本

3.10

Software version | 软件版本 (magic-pdf --version)

0.8.x

Device mode | 设备模式

cuda

@Analect Analect added the bug Something isn't working label Oct 13, 2024
@myhloli
Copy link
Collaborator

myhloli commented Oct 13, 2024

  1. About not found modles file.
    !wget https:/opendatalab/MinerU/raw/master/magic-pdf.template.json && mv magic-pdf.template.json ~/magic-pdf.json && sed -i 's|/tmp/models|{model_dir}|g' ~/magic-pdf.json

->

!wget https:/opendatalab/MinerU/raw/master/magic-pdf.template.json && mv magic-pdf.template.json ~/magic-pdf.json && sed -i 's|/tmp/models|{model_dir}/models|g' ~/magic-pdf.json

is correct.

Of course, to simplify the process for users downloading model files and to prevent issues similar to the feedback received, we have updated the model download script. When you use the latest download_models_hf.py, the script will automatically download magic-pdf.json and configure the model path, eliminating the need for users to manually execute model path update code. For detailed procedures, please refer to: https:/opendatalab/MinerU/blob/master/docs/README_Ubuntu_CUDA_Acceleration_en_US.md

  1. About palldegpu's version, cu118 is right.
    We avoid conflicts between paddlepaddle-gpu and torch with cu121, respectively, by using paddlepaddle-gpu with cu118 on the Linux system.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants