tesseract

Langue: en

Autres versions - même langue

Version: 338218 (ubuntu - 24/10/10)

Section: 1 (Commandes utilisateur)

NAME

tesseract - command line OCR tool

SYNOPSIS

tesseract imagename outputbase [configfile] [-l <langid>]

DESCRIPTION

This manual page documents briefly the tesseract command.

tesseract is a commercial quality OCR engine originally developed at HP between 1985 and 1995. In 1995, this engine was among the top 3 evaluated by UNLV. It was open-sourced by HP and UNLV in 2005.

OPTIONS

imagename must be a TIF image with a .tif extension.

outputbase is the text file created with the OCR output

configfile is a file of control parameters used for debugging or modifying tesseract's behaviour. They are stored in /usr/share/tesseract-ocr/tessdata/configs/

The -l <langid> option must come last. At the time of writing, there are language packages available for English (eng), German (deu), German fraktur (deu-f), French (fra), Italian (ita), Dutch (nld), Portuguese (por), Spanish (spa), and Vietnamese (vie).

SEE ALSO

feh(1), convert(1), mftraining(1), cntraining(1), unicharset_extractor(1), wordlist2dawg(1).

AUTHOR

tesseract was written by Ray Smith.

This manual page was written by Jeffrey Ratcliffe <Jeffrey.Ratcliffe@gmail.com>, for the Debian project (but may be used by others).