Skip to content

Simple Shiny web application to extract text from scanned images using Tesseract

License

Notifications You must be signed in to change notification settings

DIGI-VUB/scan2text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scan2text

Shiny application to easily extract text from areas in images or scanned pdf files

Installation

Install the following R packages

install.packages("shinydashboard")
install.packages("shiny")
install.packages("shinyFiles")
install.packages("data.table")
install.packages("jsonlite")
install.packages("magick")
install.packages("pdftools")
install.packages("tesseract")
install.packages("digest")

This uses R package pdftools to convert pdf files into png files.

Usage

Either

  • If you downloaded this repository just run shiny::runApp("app.R") to launch the Shiny application
  • If you did not download this repository just run shiny::runGitHub("DIGI-VUB/scan2text") to launch the Shiny application
  • Or in RStudio open the file app.R and press the > Run App button

  • Selected areas from the images or pdf files and the optionally extracted text contained in these areas are saved in de www folder of your current directory

DIGI

By DIGI: Brussels Platform for Digital Humanities: https://digi.research.vub.be

About

Simple Shiny web application to extract text from scanned images using Tesseract

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages