Friday, December 13, 2013

PDF to Text with OCR Ubuntu

$sudo apt-get install tesseract-ocr
$sudo apt-get install convert

$ convert -density 300 in.pdf out.png
$ tesseract out.png out
$ vi out.txt

For multiple files

for i in out-*.png ; do tesseract $i $i-txt; done

No comments: