Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it possible to convert a PDF into a PDF/UA Format? #731

Open
Samyssmile opened this issue Oct 12, 2024 · 3 comments
Open

Is it possible to convert a PDF into a PDF/UA Format? #731

Samyssmile opened this issue Oct 12, 2024 · 3 comments

Comments

@Samyssmile
Copy link

Is it possible to convert a PDF into a PDF/UA Format using this MinerU?

@myhloli
Copy link
Collaborator

myhloli commented Oct 13, 2024

We currently do not support converting PDFs into the PDF/UA format. However, we can provide some of the necessary data for conversion to UA format, such as tagging information for titles, text, and lists within PDF pages; position coordinates and reading order of different paragraphs; textual information corresponding to scanned documents; and caption information for figures or tables.

@Samyssmile
Copy link
Author

@myhloli Thank you for your response.
Is there any Example fot extraction of that data?

Thank you for your work

@myhloli
Copy link
Collaborator

myhloli commented Oct 14, 2024

@myhloli Thank you for your response. Is there any Example fot extraction of that data?

Thank you for your work

Here are some explanations regarding the format parsing of the intermediate state JSON output:

https:/opendatalab/MinerU/blob/master/docs/output_file_en_us.md#some_pdf_middlejson

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants