Funda ukuthi ungawubona kanjani kahle umbhalo esithombeni nge-tesseract ne-ocrfeeder.

Iningi lenu selivele selivele liyazazi izinhlelo zokubona (OCR), uma kunjalo, nihlangabezane nezinye ezingaziboni izinhlamvu ezijwayelekile zolimi lweSpanishi njenge-eñe, tílde phakathi kwabanye (ñ, ó, ü).

Manje ngiyabonga i-tesseract nakuphakheji i-tesract-ocr-spa Sizokwazi ukubona lezi zinhlamvu futhi sizobona ukuthi siziphatha kanjani izithombe ezithile lapho amazinga ombala noma amaphikseli engalungile.

Okokuqala kufanele sifake izinhlelo ezilandelayo:

umagazine
i-tesract-ocr-spa
umagazine

Ku-Debian ngikweluleka ukuthi uwafake ngaphandle kokufaka ama-softwares anconyiwe:

sudo apt-get --no-install-recommends install ocrfeeder tesseract-ocr-spa tesseract-ocr

Uma sinesithombe (idokhumenti eskeniwe) lapho incwadi ifundeka khona, kuzokwazi ukubona umbhalo cishe ezimweni ezingama-90%, amatafula ngeke abonwe, uma isithombe sinamakholomu ama-2 sizobona ikholomu ngokuzenzekelayo okokuqala bese kuthi enye igcine ukuhleleka kombhalo.

Kunezindlela ezi-2 zokubona umbhalo, eyodwa ngokusebenzisa umugqa womyalo ku-terminal noma nge-ocrfeeder, eyokugcina izodinga isikhathi esithe xaxa sokucubungula:

Indlela yomugqa womyalo:

tesseract "/entrada/fichero.jpg" "/salida/fichero.txt" -l spa -psm 3

Ukuguqulwa kwezithombe eziningi sizosebenzisa umyalo olandelayo:

cd /carpeta/imagenes
find ./ -name "*.jpg" | sort | while read file; do tesseract "$file" "`basename "$file" | sed 's/\.[[:alnum:]]*$//'`.txt" -l spa -psm 3; done

Ukujoyina amafayela ombhalo aqhamukayo kufolda eshiwo sizosebenzisa umyalo olandelayo lapho izigaba zizohlanganiswa kahle.

cd /carpeta/imagenes
find ./ -name "*.txt" | sort | while read file; do cat "$file" | sed 's|^$|##|g' | tr '\n' " " | tr '##' "\n" >> Texto-unido.txt; done

Indlela ene-ocrfeeder:
1- Sivula uhlelo lwe-ocrfeeder.
2- Sihlela injini ngokuchofoza ku-Amathuluzi - Izinjini ze-OCR, sikhetha injini esseract bese sichofoza ku-edit, futhi lapho ithi izingxabano zenjini, siguqula umbhalo walena:

$IMAGE $FILE -l spa -psm 3 > /dev/null 2> /dev/null; cat $FILE.txt; rm $FILE $FILE.txt

3- Singenisa isithombe noma ifolda lapho kunezithombe eziningana.
4- Sichofoza kumbhalo okhombayo, uma lo mbhalo usuhlonziwe ungazikhethela mathupha ukuthi yiziphi izingxenye zaso ezizoba izithombe noma umbhalo.
5- Ngaphambi kokukhipha idokhumenti sichofoza ku-Hlela - Hlela ikhasi, sikhetha ikhasi olifunayo, okuvame kakhulu incwadi (incwadi).
6- Ukuthekelisa idokhumende sichofoza ku-File - Export, sikhetha ifomethi yokukhipha oyifunayo, uma idokhumenti inezithombe ngikweluleka ukuthi usebenzise ifomethi ye-odt noma ye-html, uma kungumbhalo kuphela kungcono ukusebenzisa i-Plain Text ( txt).

Lokhu akugcini lapha ngoba kunamakhophi amaningi amakhwalithi awo awanele, ukulungisa lezi sizosebenzisa i-gimp nesihlungi se-embossed (Le nqubo ingahamba kancane):
1- Sivula isithombe nge-gimp.
2- Sichofoza ku-Filters - Distortions - Embossing, sikhetha ibhokisi le-bump map, silungisa amazinga e-azimuth acishe abe yi-162,25, aphakame abe ngu-88,73 futhi ajule abe ngu-6 noma 3. Sigcina isithombe ngekhwalithi engu-100% uma iyi-jpg, ngokuthekelisa - name.jpg.

Ngokuzikhethela ungalungisa amazinga amhlophe ngokuchofoza ku-Colors - Levels - auto.