Yadda ake OCR a PDF kuma ba damar zaɓi rubutu da bincike

A ce kuna da PDF wanda aka ƙirƙira shi ta amfani da sikanin, ko kuma an ba ku shi amma ya ƙunshi bayanin a cikin hoto. Ana kiran hanyar da dole ne mu gabatar da ƙaunataccen PDF ɗin mu OCR: tsari ne wanda yake gano alamomin ta atomatik ko haruffan mallakar wasu haruffa, daga hoto don adana shi ta hanyar bayanan da zamu iya mu'amala da su ta amfani da shirin shirya rubutu ko makamancin haka.


pdfocr kayan aiki ne mai sauƙi wanda ke ƙirƙirar sabon PDF tare da ɗakunan rubutu da aka saka, wanda ke bawa mai amfani damar zaɓar rubutu da bincika kalmomi a ciki, ba tare da canza bayyanar ƙarshe ta PDF ba.

Abin da pdfocr yake BA don:

Wannan yana da amfani kawai idan PDF ya ƙunshi bayanin a cikin hoton hoto; idan kun fitar da PDF daga OpenOffice, ya riga yana da rubutun rubutu wanda aka saka, don haka wannan aikin bashi da mahimmanci.

Yadda ake girka pdfocr:

sudo add-apt-mangaza ppa: gezakovacs / pdfocr
sudo apt-samun sabuntawa
sudo apt-samun shigar pdfocr

Yadda ake amfani da pdfocr:

Bude m, je zuwa ga adireshin inda PDF dinda kake son canzawa yake, saika shigar da wadannan (maye gurbin input.pdf da PDF din da kake son canzawa da fitarwa.pdf da sunan sabon fayil din tare da rubutun rubutu da aka saka )

pdfocr -i shigar.pdf -o fitarwa.pdf

Jira kowane shafi na PDF ɗinka ya zama aikin OCR kuma ƙirƙirar fayil ɗin ƙarshe wanda aka ƙera. Wannan yakamata ya ɗauki secondsan daƙiƙa kaɗan a kowane shafi, gwargwadon ƙudurin PDF ɗinka.


Bar tsokaci

Your email address ba za a buga. Bukata filayen suna alama da *

*

*

  1. Wanda ke da alhakin bayanan: Miguel Ángel Gatón
  2. Manufar bayanan: Sarrafa SPAM, sarrafa sharhi.
  3. Halacci: Yarda da yarda
  4. Sadarwar bayanan: Ba za a sanar da wasu bayanan ga wasu kamfanoni ba sai ta hanyar wajibcin doka.
  5. Ajiye bayanai: Bayanin yanar gizo wanda Occentus Networks (EU) suka dauki nauyi
  6. Hakkoki: A kowane lokaci zaka iyakance, dawo da share bayanan ka.

  1.   Rudolph Lara m

    rodolfo @ rodolfo-tebur: ~ $ sudo apt-samun shigar pdfocr
    Karatun jerin kunshin ... Anyi
    Treeirƙiri bishiyar dogaro
    Karanta bayanan halin ... Anyi
    E: Ba za a iya gano kunshin pdfocr ba
    rodolfo @ rodolfo-tebur: ~ $

  2.   Bari muyi amfani da Linux m

    Shin kun tabbatar da ƙara PPA mai dacewa?
    Wannan PPA tabbas yana da nau'ikan pdfocr don tsofaffin sifofin Ubuntu. Ka yi tunanin cewa wannan sakon tuni ya cika watanni da yawa. Duk da haka dai, ra'ayin daya ne. Je zuwa Launchpad kuma nemi PPA wanda ya ƙunshi nau'ikan pdfocr don Maverick.
    Murna! Bulus.

  3.   jvare m

    Da kyau, zai zama batun gwada shi don ganin yadda yake aiki

  4.   Bari muyi amfani da Linux m

    Ci gaba! Bari muji idan kunyi nasara !! Idan bai yi aiki ba za mu iya ƙoƙarin taimaka muku! Murna! Bulus.

  5.   a01653 m

    Sannu,
    Na gwada shirin a pdf kuma sakamakon ba shi da kyau. Na saba da sana'a acrobat 8 ​​kuma ina neman wani abu makamancin haka. Acrobat ya ba masu amfani damar zuwa fayiloli don tsabtace da kuma daidaita pdfs ɗin don haka sami mafi kyawun tushe don ocr. Shin kun san ko akwai mafita ga wannan.

    gaisuwa

  6.   Bari muyi amfani da Linux m

    Barka dai! Na ji cewa Tesseract shine mafi kyawun buɗewa OCR. Ban sani ba ko zai yi kyau. Hakanan, dole ne hannayenku su ɗan datti don yin aiki. Ga wasu umarnin. Idan kuna nasara, da fatan za a sanar da ni tunda, idan yana aiki, da alama zai ƙare ya zama matsayi.

    Da farko shigar da fakitin "tesseract 2.03-4" da "imagemagick" ta amfani da Synaptic, "xsane2tess" daga "http://download.tuxfamily.org/guadausers/guadaV4/".

    Bayan haka sai a kirkiri tmp folda a: / gida / sunan mai amfani / tmp

    Sannan buɗe Xsane don saita shi, Zaɓuɓɓuka-> Kanfigareshan--> OCR shafin kuma cika waɗannan masu biyowa:

    Umurnin OCR -> xsane2tess -l spa
    Zaɓin fayil ɗin shigarwa -> -i
    Zaɓin zaɓin fayil -> -o
    Zaɓin fitarwa -fd dubawa -> -x

    A cikin abubuwan daidaitawa na Xsane a cikin shafin "adana" a ɓangaren da aka faɗi kundin adireshin na ɗan lokaci, tabbatar cewa akwai fayil ɗin "tmp" da kuka ƙirƙira a cikin "/ home / yourusername

    Na kuma bar muku shafi mai cikakken bayani kan yadda ake yin OCR a cikin Ubuntu: https://help.ubuntu.com/community/OCR

  7.   Bari muyi amfani da Linux m

    Wata hanyar da na gano x akwai masu zuwa:

    A zaton cewa na'urar daukar hotan takardu an riga an haɗa ta kuma tsarin ya san shi

    1. Na bude Tsarin Mulki> Gudanarwa> Manajan Fakitin Synaptic (a cikin GNOME)

    2. Na bincika kuma nayi tsarin girka tesseract-ocr-spa (don sikane cikin Spanish) da gscan2pdf

    3. Don bincika Ina buɗe Aikace-aikace> Zane-zane> gscan2pdf

    Kuma a shirye.

  8.   Matsala m

    Kai aboki, na gode sosai, gaskiyar ita ce tesseract kayan aiki ne mai kyau, amma an iyakance shi idan aka kwatanta da littattafan da ke da '' matsala ''. Ta wani bangaren kuma, wannan manhaja tana daidaitawa cikin sauki ... 😀

  9.   yar anez m

    A cikin aikin digitizing Hotuna, ana canza fayilolin PDF-A, waɗannan dole ne su kasance OCRed. Yaya tasirin sakamako yake a cikin Baki da fari ko Grayscale? Menene shawarar?