-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Documentation: schematic and algorithms or heuristics used in-between. #119
Comments
Hi @prhbrt, thank you for your questions. A rough diagram showing the flow of the data through the various models can be found here. And here is an excerpt from our paper describing the heuristics used for reading order detection:
Note that @vahidrezanezhad is currently working on a version that infers the reading order using a machine learning model, see the most recent commits here. |
btw, since version 0.3.0, Eynollah also has a batch mode (using the |
I'm trying to get a better understanding of your work and creating a workflow that allows batching pages without reloading the models (which takes a lot of time currently). However, your code is sometimes somewhat hard to follow. Could you provide a (crude) schematic of the different models you're using as a graph and quick summary of the algorithmic (non-neural network) parts.
Currently I'm mostly confused by how the reading order is decided, what is the algorithm there?
The text was updated successfully, but these errors were encountered: