Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text overlaps on some ALTO files #2

Open
benwbrum opened this issue May 9, 2024 · 4 comments
Open

Text overlaps on some ALTO files #2

benwbrum opened this issue May 9, 2024 · 4 comments

Comments

@benwbrum
Copy link

benwbrum commented May 9, 2024

We'e run into a handful of ALTO files in which the text overlay font is too large, resulting in overlapping text regions and unreadable text.

Here's an example:

  • Configure OSD with
    tileSources: ["{\"type\":\"image\",\"url\":\"https://fromthepage.com/images/uploaded/32108542/MS0236-S01-004_003-0.jpg\"}"
  • Configure the text layer with
    textlayer.loadOCR('https://fromthepage.com/nal/charles-c-plitt-collection/ms0236-s01-004/34013749/alto_xml')

Results:
Screenshot from 2024-05-09 07-02-24

@rsimon
Copy link
Member

rsimon commented May 13, 2024

Hi @benwbrum,

Would it be possible to post the ALTO file here? I can access the image, but the ALTO seems behind a login. (Otherwise, coud you email it to me?)

A simple fix would (probably) be to reduce the global scaling factor that's applied by the overlay plugin. That's now hard-coded in the code. I could make it configurable from the outside. Although that wouldn't be much help, I guess. Are the other pages coming out fine, more or less? In this case, reducing the global scaling factor would likely make text on other pages too small.

Either way, tweaking the scaling factor will always be a bit poking at the problem with a stick. Not sure how to make this smarter & more sustainable... One solution I could think of would be, perhaps, to compare the actual size of the rendered text with the size of the annotation (which isn't rendered - but known internally). I can do some testing along these lines. Do let me know if you have any further ideas though!

@benwbrum
Copy link
Author

Absolutely! Github won't let me add XML files to issues, so I've posted it as a Gist which you may be able to point Annotorious at.

Regarding other pages, we see fonts that are too big more often than we don't.

I was wondering if it would be possible to detect overlapping text, then downscale if any was present. I don't know enough about whether that's visible to you -- presumably the annotations themselves are not overlapping, and the text just over-runs their bounding boxes? (This may be what you mean on by comparing actual size with annotation size.)

Another option would be to add a font scaling selector to the UI, so that end users can change the font size.

@rsimon
Copy link
Member

rsimon commented May 14, 2024

Thanks!

Ah, that's weird. I'm getting a good result with the current version of the plugin:

Bildschirmfoto 2024-05-14 um 17 11 24

Do you know which version you are using? Maybe I got something mixed up with the last PR, and you're still on the old version with Less Smartness™. I'll rebuild the bundle and send another PR. Let's see if this fixes the problem.

@benwbrum
Copy link
Author

That sounds like a great plan. There were some issues with strange commits in the last two PRs, so it's very possible that I messed something up when merging your changes in.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants