Phunzirani momwe mungazindikire molondola mawu m'chifaniziro ndi tesseract ndi ocrfeeder.

Ambiri a inu muyenera kudziwa mapulogalamu a Optical Character (OCR), ngati ndi choncho, mwakumana ndi ena omwe sazindikira zilembo zaku Spain monga eñe, tílde pakati pa ena (ñ, ó, ü).

Tsopano chifukwa cha tesseract ndi phukusi la package mvula-ocr-spa Titha kuzindikira otchulidwawo ndipo tiwona momwe tingachitire ndi zithunzi zina pomwe mitundu kapena mapikiselo siabwino.

Choyamba tiyenera kukhazikitsa mapulogalamu awa:

wanjan-ocr
mvula-ocr-spa
kutchfuneral

Mu Debian ndikukulangizani kuti muwaike osakhazikitsa zida zotsatsira:

sudo apt-get --no-install-recommends install ocrfeeder tesseract-ocr-spa tesseract-ocr

Ngati tili ndi chithunzi (cholembedwa) cholembedwacho, titha kuzindikira mawuwo pafupifupi 90% ya milandu, magomewo sazindikirika, ngati chithunzicho chili ndi mizati iwiri chimangodziwikanso chokha choyamba kenako winayo kuti azisunga ndendende zomwe zalembedwazo.

Pali njira ziwiri zodziwira lembalo, imodzi kudzera mu mzere wamagetsi mu terminal kapena kudzera pa ocrfeeder, yomalizayi idzafuna nthawi yochulukirapo:

Njira yolamula:

tesseract "/entrada/fichero.jpg" "/salida/fichero.txt" -l spa -psm 3

Pakusintha kwa zithunzi zingapo tigwiritsa ntchito lamulo ili:

cd /carpeta/imagenes
find ./ -name "*.jpg" | sort | while read file; do tesseract "$file" "`basename "$file" | sed 's/\.[[:alnum:]]*$//'`.txt" -l spa -psm 3; done

Kuti mulowetse mafayilo am'bukuli mufodayo tidzagwiritsa ntchito lamulo lotsatirali lomwe ndimezo ziphatikizidwa molondola.

cd /carpeta/imagenes
find ./ -name "*.txt" | sort | while read file; do cat "$file" | sed 's|^$|##|g' | tr '\n' " " | tr '##' "\n" >> Texto-unido.txt; done

Njira ndi ocrfeeder:
1- Timatsegula pulogalamu ya ocrfeeder.
2- Timasintha injini podina Zida - OCR Engines, sankhani esseract engine ndikudina kusintha, ndipo pomwe akuti zotsutsana ndi injini, timasinthira izi:

$IMAGE $FILE -l spa -psm 3 > /dev/null 2> /dev/null; cat $FILE.txt; rm $FILE $FILE.txt

3- Timatumiza chithunzi kapena chikwatu pomwe pali zithunzi zingapo.
4- Timadina chikalatacho, chikalatacho chikadziwika mutha kusankha pamanja kuti ndi ziti kapena zithunzithunzi ziti.
5- Tisanatumize chikalatacho timadina Sinthani - Sinthani tsamba, timasankha tsamba lomwe tikufuna, chodziwika kwambiri ndi kalata (kalata).
6- Kutumiza chikalatacho timadina pa File - Export, timasankha mtundu womwe mukufuna, ngati chikalatacho chili ndi zithunzi ndikukulangizani kuti mugwiritse ntchito mtundu wa odt kapena html, ngati ndi mawu okha ndibwino kugwiritsa ntchito Plain Text ( txt) mawonekedwe.

Izi sizimathera pano chifukwa pali mafotokope ambiri omwe mtundu wawo siwokwanira, kuti tikonze izi tidzagwiritsa ntchito gimp ndi fyuluta yojambulidwa (Izi zitha kuchedwa):
1- Timatsegula chithunzicho ndi gimp.
2- Timadina Zosefera - Zosokoneza - Kutulutsa Makonda, Timasankha bokosi la mapampu, timasintha azimuth kukhala pafupifupi 162,25, kukwera mpaka 88,73 ndikuzama mpaka 6 kapena 3. Timasunga chithunzicho ndi 100% ngati ndi jpg, potumiza - name.jpg.

Mwakusankha mutha kusintha magawo oyera podina Mitundu - Magawo - magalimoto.


Zomwe zili m'nkhaniyi zikutsatira mfundo zathu za malamulo okonzekera. Kuti mufotokoze cholakwika dinani Apa.