i-spaCy, ilayibrari yokulungisa ulwimi yendalo

Ukuqhuma kwe-AI kutyhilwe ukwenziwa kwe Inguqulelo entsha yethala leencwadi lasimahla «SpaCy»Olunokuphunyezwa kwe Ulungelelwaniso lolwimi lwendalo (NLP). Ngokwenza oko, iprojekthi ingasetyenziselwa ukwakha abaphenduli abazenzekelayo, bots, classifiers text, kunye neenkqubo ezahlukeneyo zencoko yababini emisela intsingiselo yamabinzana.

Ithala leencwadi yenzelwe ukubonelela nge-API engapheliyo Ayinxibelelananga nee-algorithms ezisetyenzisiweyo kwaye zilungele ukusetyenziswa kwiimveliso zokwenyani. Ithala leencwadi isebenzisa inkqubela phambili yamva kwi-NLP kunye neyona algorithms isebenzayo ziyafumaneka ukuqhubekekisa ulwazi.

Ukuba i-algorithm esebenza ngakumbi iyavela, ithala leencwadi lidluliselwe kuyo, kodwa olu tshintsho aluchaphazeli i-API okanye usetyenziso.

Inqaku le-spaCy lukwayilo olwenzelwe ukuqhubekeka namaxwebhu apheleleyo, ngaphandle kokuqhubekeka phambili kwiprosesa ezahlula uxwebhu lube ngamabinzana. Iimodeli zinikezelwa kwiinguqulelo ezimbini: kwimveliso ephezulu kunye nokuchaneka okuphezulu.

Iimpawu eziphambili zespaCy:

  • Inkxaso kwiilwimi ezingama-60.
  • Sele ziqeqeshiwe iimodeli zeelwimi ezahlukeneyo kunye nokusetyenziswa.
  • Imultitask yokufunda kusetyenziswa iitshintshi eziqeqeshwe ngaphambili ezinjengeBERT (iBidirectional Encoder Renderings of Transformers).
  • Inkxaso yeevektri eziqeqeshwe kwangaphambili kunye nokushumeka kwamagama.
  • Ukwenza okuphezulu.
  • Ukulungele ukusetyenziswa kwimodeli yenkqubo yoqeqesho emsebenzini.
  • Ukukhutshwa kweetokenization ngokolwimi.
  • Izinto ezenziwe ngoku zilungele ukudibanisa izinto ezinamagama, ukumakisha iinxalenye zentetho, ukwahlula isicatshulwa, ukuhlalutya ukuxhomekeka kwethegi, ukwahlula izivakalisi, ukumakisha iinxalenye zentetho, uhlalutyo lwe-morphological, stem, njl.
  • Inkxaso yokwandisa ukusebenza kwezinto zesiko kunye neempawu.
  • Inkxaso yokudala iimodeli zakho ezisekwe kwiPyTorch, TensorFlow kunye nezinye izikhokelo.
  • Izixhobo ezakhelwe-ngaphakathi zokuBopha iZiko eliNisiweyo kunye nokuboniswa kweSyntax (NER, Ukwamkelwa kweZiko eliNisiweyo).
  • Inkqubo elula yokupakisha kunye nokuhambisa iimodeli kunye nokulawula ukuhamba komsebenzi.
  • Ukuchaneka okuphezulu.

Ithala leencwadi ibhaliwe kwiPython enezinto eziseCython, ulwandiso lwePython oluvumela ukusebenza ngokuthe ngqo kulwimi C.

Ikhowudi yeprojekthi isasazwa phantsi kwelayisenisi ye-MIT. Iimodeli zolwimi zilungele iilwimi ezingama-58.

Malunga nohlobo olutsha lwe-spaCy 3.0

Inguqulelo ye-spaCy 3.0 ibalasele ekuphunyezweni kwefayile ye- iintsapho eziyimodeli uphinde waqeqeshwa ngeelwimi ezili-18 kwaye Ngama-59 imibhobho eqeqeshiweyo xa zizonke, kubandakanya imibhobho emihlanu esisiseko sombane

Umzekelo unikezelwa kwiinguqulelo ezintathu (16 MB, 41 MB - 20 lamawaka ee vectors kunye ne 491 MB - 500 lamawaka ee vectors) kunye ilungiselelwe ukusebenza phantsi komthwalo weCPU kwaye ibandakanya i-tok2vec, morphologizer, parser, senter, ner, attribute_ruler, kunye nezinto ze-lemmatizer.

Sisebenze kwi-spaCy v3.0 ngaphezulu konyaka, kwaye phantse iminyaka emibini ukuba ubala wonke umsebenzi owenziwe kwi-Thinc. Eyona njongo yethu iphambili kuphehlelelo kukwenza kube lula ukuzisa iimodeli zakho kwi-SPACY, ngakumbi iimodeli zikarhulumente ezifana neziguquli. Ungabhala iimodeli ezondla izinto ze-spaCy kwizakhelo ezinjengePyTorch okanye iTensorFlow, usebenzisa inkqubo yethu entsha yokumisela ukuchaza zonke iisetingi zakho. Kwaye ekubeni ukuhamba kwamhlanje kwe-NLP kuhlala kunamanyathelo amaninzi, kukho inkqubo yokuhamba komsebenzi entsha ukukunceda ugcine umsebenzi wakho uququzelelwe.

Olunye ubuchule obubalulekileyo evelele kwinguqulelo entsha:

  • Ukuhamba komsebenzi okutsha kwiimodeli zoqeqesho.
  • Inkqubo entsha yokumisela.
  • Inkxaso yemodeli yombhobho esekwe kuguquguquko, efanelekileyo yokufunda kwimisebenzi emininzi.
  • Isakhono sokudibanisa iimodeli zakho usebenzisa izikhokelo ezahlukeneyo zomatshini, ezinje ngePyTorch, TensorFlow, kunye neMXNet.
  • Inkxaso yeProjekthi yokulawula onke amanqanaba okuhamba komsebenzi, ukusuka kulungiso lwangaphambi kokumiselwa kokuphunyezwa kwemodeli.
  • Inkxaso yokudityaniswa koLawulo lweDatha yeDatha (i-DVC), iStrllit, iiWights & Biases kunye neephakeji zikaRay.
  • Izinto ezintsha ezakhelwe-ngaphakathi: IsigweboRecognizer, iMorphologizer, iLemmatizer,
  • Umlawuli onguMphathi kunye noTshintsho.
  • I-API entsha yokwenza izinto zakho.

Ekugqibeleni, ukuba unomdla wokwazi okungakumbi ngayo yale nguqulo intsha okanye malunga ne-spaCy, ungazijonga iinkcukacha Kule khonkco ilandelayo.


Umxholo wenqaku uyabambelela kwimigaqo yethu imigaqo yokuziphatha yokuhlela. Ukuxela impazamo cofa apha.

Yiba ngowokuqala ukuphawula

Shiya uluvo lwakho

Idilesi yakho ye email aziyi kupapashwa. ezidingekayo ziphawulwe *

*

*

  1. Uxanduva lwedatha: UMiguel Ángel Gatón
  2. Injongo yedatha: Ulawulo lwe-SPAM, ulawulo lwezimvo.
  3. Umthetho: Imvume yakho
  4. Unxibelelwano lwedatha: Idatha ayizukuhanjiswa kubantu besithathu ngaphandle koxanduva lomthetho.
  5. Ukugcinwa kweenkcukacha
  6. Amalungelo: Ngalo naliphi na ixesha unganciphisa, uphinde uphinde ucime ulwazi lwakho.