Koyi yadda ake gane rubutu daidai cikin hoto tare da tesseract da ocrfeeder.

Da yawa daga cikinku dole ne sun riga sun san shirye-shiryen gane halayen ido (OCR), idan haka ne, kun haɗu da wasu waɗanda ba su iya fahimtar haruffa irin na yaren Sifen kamar eñe, tílde da sauransu (ñ, ó, ü).

Yanzu godiya ga cirewa kuma ga kunshin tesseract-ocr-wurin shakatawa Za mu iya gane waɗannan haruffa kuma za mu ga yadda za a bi da wasu hotuna inda launuka ko matakan pixel ba daidai ba ne.

Da farko dole ne mu girka waɗannan shirye-shiryen masu zuwa:

tarin-ocr
tesseract-ocr-wurin shakatawa
masarauta

A cikin Debian na baku shawara ku girka su ba tare da sanya softwares da aka bada shawarar ba:

sudo apt-get --no-install-recommends install ocrfeeder tesseract-ocr-spa tesseract-ocr

Idan muna da hoto (takaddun sikanin) wanda harafin sahihi ne, zai yuwu a gane rubutun a kusan kashi 90% na shari'o'in, ba za'a gane tebur ba, idan hoton yana da ginshikai 2 zai gane kansa ta atomatik shafi na farko sannan ɗayan don kula da tsarin rubutu.

Akwai hanyoyi 2 don gane rubutu, ɗayan ta layin umarni a cikin tashar ko ta ocrfeeder, na ƙarshen zai buƙaci ƙarin lokacin aiki:

Hanyar layin umarni:

tesseract "/entrada/fichero.jpg" "/salida/fichero.txt" -l spa -psm 3

Don canza hotuna da yawa zamuyi amfani da wannan umarnin:

cd /carpeta/imagenes
find ./ -name "*.jpg" | sort | while read file; do tesseract "$file" "`basename "$file" | sed 's/\.[[:alnum:]]*$//'`.txt" -l spa -psm 3; done

Don shiga fayilolin rubutu da aka haifar a cikin fayil ɗin da aka faɗi za mu yi amfani da umarni mai zuwa wanda za a haɗa sakin layin daidai.

cd /carpeta/imagenes
find ./ -name "*.txt" | sort | while read file; do cat "$file" | sed 's|^$|##|g' | tr '\n' " " | tr '##' "\n" >> Texto-unido.txt; done

Hanyar tare da ocrfeeder:
1- Mun bude shirin ocrfeeder.
2- Muna shirya injin din ta hanyar latsa Kayan aiki - Injinan OCR, zaɓi injin da aka zaba sannan danna edita, kuma inda ya ce bahasin injin, za mu canza rubutun ga wannan:

$IMAGE $FILE -l spa -psm 3 > /dev/null 2> /dev/null; cat $FILE.txt; rm $FILE $FILE.txt

3- Mun shigo da hoto ko folda a inda hotunan suke da yawa.
4- Muna danna kan gano takaddar, da zarar an gano takaddar, da hannu zaka iya zaɓar waɗanne ɓangarorinta zasu zama hotuna ko rubutu.
5- Kafin fitar da takaddar mun latsa kan Shirya - Shirya shafi, mun zaɓi shafin da ake so, mafi yawanci shine harafi (harafi).
6- Don fitar da takaddar da muka latsa kan Fayil - Fitarwa, mun zaɓi tsarin fitarwa da ake so, idan takaddar tana da hotuna ina ba ku shawara da ku yi amfani da odt ko html, idan rubutu ne kawai ya fi kyau a yi amfani da Rubutun Bayyanan ( txt) tsari.

Wannan bai ƙare a nan ba saboda akwai kwafin hoto da yawa waɗanda ƙarancinsu bai isa ba, don gyara waɗannan za mu yi amfani da gimp da matatar da aka zana (Wannan aikin na iya zama mai jinkiri):
1- Muna buɗe hoton da gimp.
2- Muna danna Matatun - Rarrabawa - Embossing, Mun zabi akwatin taswirar karo, mun daidaita matakan azimuth zuwa kusan 162,25, daukaka zuwa 88,73 da zurfin zuwa 6 ko 3. Muna adana hoton da inganci 100% idan jpg ne, a fitarwa - name.jpg.

Zabi kuna iya daidaita matakan farin ta danna Launuka - Matakan - atomatik.