Iningi lenu selivele selivele liyazazi izinhlelo zokubona (OCR), uma kunjalo, nihlangabezane nezinye ezingaziboni izinhlamvu ezijwayelekile zolimi lweSpanishi njenge-eñe, tílde phakathi kwabanye (ñ, ó, ü).
Manje ngiyabonga i-tesseract nakuphakheji i-tesract-ocr-spa Sizokwazi ukubona lezi zinhlamvu futhi sizobona ukuthi siziphatha kanjani izithombe ezithile lapho amazinga ombala noma amaphikseli engalungile.
Okokuqala kufanele sifake izinhlelo ezilandelayo:
umagazine
i-tesract-ocr-spa
umagazine
Ku-Debian ngikweluleka ukuthi uwafake ngaphandle kokufaka ama-softwares anconyiwe:
sudo apt-get --no-install-recommends install ocrfeeder tesseract-ocr-spa tesseract-ocr
Uma sinesithombe (idokhumenti eskeniwe) lapho incwadi ifundeka khona, kuzokwazi ukubona umbhalo cishe ezimweni ezingama-90%, amatafula ngeke abonwe, uma isithombe sinamakholomu ama-2 sizobona ikholomu ngokuzenzekelayo okokuqala bese kuthi enye igcine ukuhleleka kombhalo.
Kunezindlela ezi-2 zokubona umbhalo, eyodwa ngokusebenzisa umugqa womyalo ku-terminal noma nge-ocrfeeder, eyokugcina izodinga isikhathi esithe xaxa sokucubungula:
Indlela yomugqa womyalo:
tesseract "/entrada/fichero.jpg" "/salida/fichero.txt" -l spa -psm 3
Ukuguqulwa kwezithombe eziningi sizosebenzisa umyalo olandelayo:
cd /carpeta/imagenes
find ./ -name "*.jpg" | sort | while read file; do tesseract "$file" "`basename "$file" | sed 's/\.[[:alnum:]]*$//'`.txt" -l spa -psm 3; done
Ukujoyina amafayela ombhalo aqhamukayo kufolda eshiwo sizosebenzisa umyalo olandelayo lapho izigaba zizohlanganiswa kahle.
cd /carpeta/imagenes
find ./ -name "*.txt" | sort | while read file; do cat "$file" | sed 's|^$|##|g' | tr '\n' " " | tr '##' "\n" >> Texto-unido.txt; done
Indlela ene-ocrfeeder:
1- Sivula uhlelo lwe-ocrfeeder.
2- Sihlela injini ngokuchofoza ku-Amathuluzi - Izinjini ze-OCR, sikhetha injini esseract bese sichofoza ku-edit, futhi lapho ithi izingxabano zenjini, siguqula umbhalo walena:
$IMAGE $FILE -l spa -psm 3 > /dev/null 2> /dev/null; cat $FILE.txt; rm $FILE $FILE.txt
3- Singenisa isithombe noma ifolda lapho kunezithombe eziningana.
4- Sichofoza kumbhalo okhombayo, uma lo mbhalo usuhlonziwe ungazikhethela mathupha ukuthi yiziphi izingxenye zaso ezizoba izithombe noma umbhalo.
5- Ngaphambi kokukhipha idokhumenti sichofoza ku-Hlela - Hlela ikhasi, sikhetha ikhasi olifunayo, okuvame kakhulu incwadi (incwadi).
6- Ukuthekelisa idokhumende sichofoza ku-File - Export, sikhetha ifomethi yokukhipha oyifunayo, uma idokhumenti inezithombe ngikweluleka ukuthi usebenzise ifomethi ye-odt noma ye-html, uma kungumbhalo kuphela kungcono ukusebenzisa i-Plain Text ( txt).
Lokhu akugcini lapha ngoba kunamakhophi amaningi amakhwalithi awo awanele, ukulungisa lezi sizosebenzisa i-gimp nesihlungi se-embossed (Le nqubo ingahamba kancane):
1- Sivula isithombe nge-gimp.
2- Sichofoza ku-Filters - Distortions - Embossing, sikhetha ibhokisi le-bump map, silungisa amazinga e-azimuth acishe abe yi-162,25, aphakame abe ngu-88,73 futhi ajule abe ngu-6 noma 3. Sigcina isithombe ngekhwalithi engu-100% uma iyi-jpg, ngokuthekelisa - name.jpg.
Ngokuzikhethela ungalungisa amazinga amhlophe ngokuchofoza ku-Colors - Levels - auto.