Introduction to Optical Character Recognition Software

When & Where

Date:

Tue, February 28, 2017 - 10:00 AM to 11:30 AM

Location:

Barrows 356: D-Lab Convening Room

Description

Type:

Getting research materials in a digital form that you can search and computationally analyze can be a time-consuming initial step in the research process. While Adobe Acrobat can do basic optical character recognition (OCR, transforming an image of a text into editable text), it performs poorly on documents with complex layouts or non-English text.

This workshop will cover how to use ABBYY FineReader, professional-level OCR software, via the OCR virtual research desktop provided by Research IT or in the D-Lab. It will also briefly cover the pros and cons of FineReader compared to the open-soruce OCR package Tesseract, and how you can use Tesseract on the Savio high-performance compute cluster for large-scale OCR jobs.

Prior knowledge: No prior knowledge is required for this workshop. Register if you have any interest in learning more about OCR tools and resources.

Technology requirement: None. This workshop will demonostrate realistic applications of OCR software.

Materials: