i-spaCy, umtapo wolwazi wokucubungula ulimi wemvelo

I-AI yokuqhuma iveze ukwethulwa kwe- inguqulo entsha yelabhulali yamahhala «I-SpaCy»Okusebenza kwe ukucubungula izilimi zemvelo (NLP). Ngokwenzayo, iphrojekthi ingasetshenziselwa ukwakha abaphendula ngokuzenzakalela, ama-bots, ama-classified classifiers, nezinhlelo ezahlukahlukene zezingxoxo ezinquma incazelo yemishwana.

Umtapo Wezincwadi yakhelwe ukuhlinzeka nge-API ephikelelayo Akuxhunywanga kuma-algorithm asetshenzisiwe futhi alungele ukusetshenziswa kwimikhiqizo yangempela. Umtapo Wezincwadi isebenzisa intuthuko yakamuva ku-NLP nama-algorithms asebenza kahle kakhulu etholakalayo ukucubungula ulwazi.

Uma kuvela i-algorithm esebenza kahle kakhulu, umtapo wezincwadi udluliswa kuwo, kepha lolu shintsho aluthinti i-API noma izinhlelo zokusebenza.

Isici se-spaCy futhi kuyisakhiwo esenzelwe ukucubungula amadokhumende aphelele, ngaphandle kokuqhubekisela phambili kuma-prerocessor ahlukanisa idokhumenti ibe yimishwana. Amamodeli anikezwa ngezinhlobo ezimbili: ukukhiqiza okuphezulu nokunemba okuphezulu.

Izici eziyinhloko ze-spaCy:

  • Ukusekelwa kwezilimi ezingaba ngu-60.
  • Amamodeli asevele aqeqeshiwe atholakala ngezilimi ezahlukahlukene kanye nokusetshenziswa.
  • Ukufunda okuningi kusetshenziswa abaguquli abaqeqeshwe phambilini njengeBERT (Bidirectional Encoder Renderings of Transformers).
  • Ukusekelwa kwamaveketha aqeqeshwe ngaphambilini nokushumeka kwamagama.
  • Ukusebenza okuphezulu.
  • Ukulungele ukusebenzisa imodeli yohlelo lokuqeqesha emsebenzini.
  • Amathokheni agqugquzelwa ngokolimi.
  • Izinto ezilungele ukusetshenziswa ziyatholakala ukuxhumanisa izinhlangano eziqanjwe ngamagama, ukumaka izingxenye zenkulumo, ukuhlukanisa umbhalo, ukuhlaziya ukuncika okususelwa kumaki, ukwahlukanisa imisho, ukumaka izingxenye zenkulumo, ukuhlaziywa kwe-morphological, stemming, njll.
  • Ukusekelwa kokunwebisa ukusebenza ngezakhi nezimpawu ngokwezifiso.
  • Ukusekela ukudala amamodeli wakho ngokususelwa kuPyTorch, TensorFlow nakwezinye izinhlaka.
  • Amathuluzi awakhelwe ngaphakathi we-Named Entity Binding and Syntax Visualization (NER, Named Entity Recognition).
  • Inqubo elula yokupakisha nokuhambisa amamodeli nokuphatha ukuhamba komsebenzi.
  • Ukunemba okuphezulu.

Umtapo Wezincwadi ibhalwe ePython enezinto eziseCython, isandiso sePython esivumela ukusebenza okuqondile ukubiza ngolimi lwe-C.

Ikhodi yephrojekthi isatshalaliswa ngaphansi kwelayisense ye-MIT. Izinhlobo zolimi sezilungele izilimi ezingama-58.

Mayelana nenguqulo entsha ye-spaCy 3.0

Uhlobo lwe-spaCy 3.0 luvelela ukusetshenziswa kwe- imindeni eyisibonelo waqeqeshwa kabusha ngezilimi eziyi-18 futhi Kuqeqeshwe amapayipi angama-59 esewonke, kufaka phakathi amapayipi ama-5 amasha asuselwa ku-transformer

Imodeli inikezwa ngezinhlobo ezintathu (16 MB, 41 MB - ama-vectors ayizinkulungwane ezingama-20 kanye nama-vector angama-491 MB - 500) futhi yenzelwe ukusebenza ngaphansi komthwalo we-CPU futhi kufaka phakathi i-tok2vec, morphologizer, parser, senter, ner, attribute_ruler, kanye nezinto ze-lemmatizer.

Besisebenza ku-spaCy v3.0 isikhathi esingaphezu konyaka, futhi cishe iminyaka emibili uma ubala wonke umsebenzi owenziwe ku-Thinc. Inhloso yethu enkulu ngokwethulwa kwalokhu ukwenza kube lula ukuletha onobuhle bakho e-SPACY, ikakhulukazi amamodeli akamuva njengabaguquli. Ungabhala amamodeli ondla izinto ze-spaCy kuzinhlaka ezinjengePyTorch noma iTensorFlow, usebenzisa uhlelo lwethu olusha lokumisa ukuchaza zonke izilungiselelo zakho. Futhi njengoba ukugeleza kokusebenza kwesimanje kwe-NLP kuvame ukuba nezinyathelo eziningi, kunohlelo olusha lokuhamba komsebenzi oluzokusiza ugcine umsebenzi wakho uhlelekile.

Okunye okusha okusha okuvelele kunguqulo entsha:

  • Ukuhamba komsebenzi okusha kwamamodeli wokuqeqesha.
  • Uhlelo olusha lokumisa.
  • Ukusekelwa kwamamodeli wepayipi asuselwa ku-transformer, alungele ukufundwa kwemisebenzi eminingi.
  • Amandla wokuxhuma amamodeli wakho usebenzisa izinhlaka ezahlukahlukene zokufunda komshini, njengePyTorch, TensorFlow, neMXNet.
  • Ukwesekwa kwephrojekthi ukuphatha zonke izigaba zokugeleza komsebenzi, kusuka ekucutshungulweni kwangaphambili kuya ekusetshenzisweni kwemodeli.
  • Ukusekelwa kokuhlanganiswa neDatha Version Control (DVC), i-Streamlit, i-Weights & Biases ne-Ray package.
  • Izinto ezintsha ezakhelwe ngaphakathi: iSentenceRecognizer, Morphologizer, Lemmatizer,
  • I-AttributeRuler ne-Transformer.
  • I-API entsha yokwakha izinto zakho.

Ekugcineni, uma unesifiso sokwazi okwengeziwe ngakho yale nguqulo entsha noma mayelana ne-spaCy, ungabheka imininingwane Kulesi sixhumanisi esilandelayo.


Okuqukethwe yi-athikili kunamathela ezimisweni zethu ze izimiso zokuhlelela. Ukubika iphutha chofoza lapha.

Yiba ngowokuqala ukuphawula

Shiya umbono wakho

Ikheli lakho le ngeke ishicilelwe. Ezidingekayo ibhalwe nge *

*

*

  1. Ubhekele imininingwane: Miguel Ángel Gatón
  2. Inhloso yedatha: Lawula Ugaxekile, ukuphathwa kwamazwana.
  3. Ukusemthethweni: Imvume yakho
  4. Ukuxhumana kwemininingwane: Imininingwane ngeke idluliselwe kubantu besithathu ngaphandle kwesibopho esisemthethweni.
  5. Isitoreji sedatha: Idatabase ebanjwe yi-Occentus Networks (EU)
  6. Amalungelo: Nganoma yisiphi isikhathi ungakhawulela, uthole futhi ususe imininingwane yakho.