How to scan documents and apply OCR in Linux

Did you try Simple Scan, the default Ubuntu program, but were disappointed to see that it doesn't support OCR, etc.? At the same time, is XSANE too complicated for the simple task you set out to do? Do you miss how easy it was to scan documents with Omnipage?

Well, no wonder ... let's see how to scan and perform OCR in the scanned docs in a very, very simple way. You will be amazed with the results.

How to scan in 2 simple steps

1.- Install gscan2pdf & tesseract-ocr (along with its respective language pack). That is, in case you are going to scan documents in English, install tesseract-ocr-eng; If they are in Spanish, install tesseract-ocr-eng and so.

sudo apt-get install gscan2pdf tesseract-ocr tesseract-ocr-eng

2.- The rest is pretty straightforward for those who have ever scanned and OCR a document in Windows. I opened gscan2pdf, scan the document, go to Options> OCR and select tesseract as an OCR engine. There are other engines, but Tesseract is by far the best performing engine. Finally, you can save the final document as PDF, DJVU, etc. going to File> Save.

Note: when saving scanned documents it is best to save them in DJVU format (the quality is the same as a PDF but there is a very important difference in size).

The following video is in English but it is enough to see it to understand how everything works.


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: Miguel Ángel Gatón
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.

  1.   Anonymous said

    Alex: Many gamers have a problem getting “friend zoned” with girls they like.
    After explaining to a confused Melissa that he is not Waldo,
    but The Hon Ludovick Watson, she agrees to go to
    England. Your question also needs to be SIMPLE enough
    for her to respond without a tone of thought.

    Here is my web blog - Tao of Badass Review

  2.   bachitux said

    Notice that the packages are also available in Fedora. 🙂

  3.   chapel said

    I have two scanners, one is the Canon Scan 5000f for A4 documents, and the other is the Braun NovoScan, for scanning negatives and slides. After installing the gscan2 utility, and rebooting, you don't see any of the scanners. what happened? Why don't you see the scanners?

  4.   Let's use Linux said

    No offense friends, but there is no point in OCRing math functions.

    In any case, they should do OCR to the surrounding text (which explains those functions or whatever) and that the functions remain as images.
    Cheers! Paul.

  5.   NotFromBrooklyn said

    Hey, if you've come up with a solution to your problem, I'd like to know.

  6.   Juan Vallejo said

    I think I'm a little late but I have a question. I'm an engineering student and I'm looking for a way to digitize and clean my notes, but the problem is that most of those notes are full of mathematical symbols, graphs, and functions. Is there currently something that can help me?

  7.   Let's use Linux said

    Great! Good date! In Arch Tesseract it is in the official repositories, but not gscan2pdf. You have to install it through yaourt.

  8.   elcaliman13142 said

    Thank you very much it helped me a lot, make linux more friendly grace again

  9.   Let's use Linux said

    You're welcome! It is a pleasure to have been able to help.
    A hug! Paul.

  10.   Martin said

    Very good I was looking for it, I'll try and I'll tell how this is going.

  11.   Mauro Nicolas Ybanez Girard said

    Thanks, I'll try!

  12.   Leonard Hernandez said

    When I go to run the OCR with the Tesseract engine it only gives me the option of the process in English even though I installed the tesseract-ocr-spa package. What I can do?

  13.   jaime and isabel said

    download gnscaner2pdf but it does not scan, it only searches for devices and does not stop searching after 15 min. What's up?