Sign up for our weekly newsletter!
Quinn Dombrowski is the Digital Humanities Coordinator in the Research IT group. She is the director of the DiRT Directory of digital research tools, and is writing a book on Drupal for humanists.
Getting research materials in a digital form that you can search and computationally analyze can be a time-consuming initial step in the research process. While Adobe Acrobat can do basic optical character recognition (OCR, transforming an image of a text into editable text), it performs poorly on documents with complex layouts or non-English text.
This workshop will cover how to use ABBYY FineReader, professional-level OCR software, via the OCR virtual research desktop provided by Research IT or in the D-Lab. It will also briefly cover the pros and cons of FineReader compared to the open-soruce OCR package Tesseract, and how you can use Tesseract on the Savio high-performance compute cluster for large-scale OCR jobs.
Prior knowledge: No prior knowledge is required for this workshop. Register if you have any interest in learning more about OCR tools and resources.
Technology requirement: None. This workshop will demonostrate realistic applications of OCR software.