I-AI yokuqhuma iveze ukwethulwa kwe- inguqulo entsha yelabhulali yamahhala «I-SpaCy»Okusebenza kwe ukucubungula izilimi zemvelo (NLP). Ngokwenzayo, iphrojekthi ingasetshenziselwa ukwakha abaphendula ngokuzenzakalela, ama-bots, ama-classified classifiers, nezinhlelo ezahlukahlukene zezingxoxo ezinquma incazelo yemishwana.
Umtapo Wezincwadi yakhelwe ukuhlinzeka nge-API ephikelelayo Akuxhunywanga kuma-algorithm asetshenzisiwe futhi alungele ukusetshenziswa kwimikhiqizo yangempela. Umtapo Wezincwadi isebenzisa intuthuko yakamuva ku-NLP nama-algorithms asebenza kahle kakhulu etholakalayo ukucubungula ulwazi.
Uma kuvela i-algorithm esebenza kahle kakhulu, umtapo wezincwadi udluliswa kuwo, kepha lolu shintsho aluthinti i-API noma izinhlelo zokusebenza.
Isici se-spaCy futhi kuyisakhiwo esenzelwe ukucubungula amadokhumende aphelele, ngaphandle kokuqhubekisela phambili kuma-prerocessor ahlukanisa idokhumenti ibe yimishwana. Amamodeli anikezwa ngezinhlobo ezimbili: ukukhiqiza okuphezulu nokunemba okuphezulu.
Izici eziyinhloko ze-spaCy:
- Ukusekelwa kwezilimi ezingaba ngu-60.
- Amamodeli asevele aqeqeshiwe atholakala ngezilimi ezahlukahlukene kanye nokusetshenziswa.
- Ukufunda okuningi kusetshenziswa abaguquli abaqeqeshwe phambilini njengeBERT (Bidirectional Encoder Renderings of Transformers).
- Ukusekelwa kwamaveketha aqeqeshwe ngaphambilini nokushumeka kwamagama.
- Ukusebenza okuphezulu.
- Ukulungele ukusebenzisa imodeli yohlelo lokuqeqesha emsebenzini.
- Amathokheni agqugquzelwa ngokolimi.
- Izinto ezilungele ukusetshenziswa ziyatholakala ukuxhumanisa izinhlangano eziqanjwe ngamagama, ukumaka izingxenye zenkulumo, ukuhlukanisa umbhalo, ukuhlaziya ukuncika okususelwa kumaki, ukwahlukanisa imisho, ukumaka izingxenye zenkulumo, ukuhlaziywa kwe-morphological, stemming, njll.
- Ukusekelwa kokunwebisa ukusebenza ngezakhi nezimpawu ngokwezifiso.
- Ukusekela ukudala amamodeli wakho ngokususelwa kuPyTorch, TensorFlow nakwezinye izinhlaka.
- Amathuluzi awakhelwe ngaphakathi we-Named Entity Binding and Syntax Visualization (NER, Named Entity Recognition).
- Inqubo elula yokupakisha nokuhambisa amamodeli nokuphatha ukuhamba komsebenzi.
- Ukunemba okuphezulu.
Umtapo Wezincwadi ibhalwe ePython enezinto eziseCython, isandiso sePython esivumela ukusebenza okuqondile ukubiza ngolimi lwe-C.
Ikhodi yephrojekthi isatshalaliswa ngaphansi kwelayisense ye-MIT. Izinhlobo zolimi sezilungele izilimi ezingama-58.
Mayelana nenguqulo entsha ye-spaCy 3.0
Uhlobo lwe-spaCy 3.0 luvelela ukusetshenziswa kwe- imindeni eyisibonelo waqeqeshwa kabusha ngezilimi eziyi-18 futhi Kuqeqeshwe amapayipi angama-59 esewonke, kufaka phakathi amapayipi ama-5 amasha asuselwa ku-transformer
Imodeli inikezwa ngezinhlobo ezintathu (16 MB, 41 MB - ama-vectors ayizinkulungwane ezingama-20 kanye nama-vector angama-491 MB - 500) futhi yenzelwe ukusebenza ngaphansi komthwalo we-CPU futhi kufaka phakathi i-tok2vec, morphologizer, parser, senter, ner, attribute_ruler, kanye nezinto ze-lemmatizer.
Besisebenza ku-spaCy v3.0 isikhathi esingaphezu konyaka, futhi cishe iminyaka emibili uma ubala wonke umsebenzi owenziwe ku-Thinc. Inhloso yethu enkulu ngokwethulwa kwalokhu ukwenza kube lula ukuletha onobuhle bakho e-SPACY, ikakhulukazi amamodeli akamuva njengabaguquli. Ungabhala amamodeli ondla izinto ze-spaCy kuzinhlaka ezinjengePyTorch noma iTensorFlow, usebenzisa uhlelo lwethu olusha lokumisa ukuchaza zonke izilungiselelo zakho. Futhi njengoba ukugeleza kokusebenza kwesimanje kwe-NLP kuvame ukuba nezinyathelo eziningi, kunohlelo olusha lokuhamba komsebenzi oluzokusiza ugcine umsebenzi wakho uhlelekile.
Okunye okusha okusha okuvelele kunguqulo entsha:
- Ukuhamba komsebenzi okusha kwamamodeli wokuqeqesha.
- Uhlelo olusha lokumisa.
- Ukusekelwa kwamamodeli wepayipi asuselwa ku-transformer, alungele ukufundwa kwemisebenzi eminingi.
- Amandla wokuxhuma amamodeli wakho usebenzisa izinhlaka ezahlukahlukene zokufunda komshini, njengePyTorch, TensorFlow, neMXNet.
- Ukwesekwa kwephrojekthi ukuphatha zonke izigaba zokugeleza komsebenzi, kusuka ekucutshungulweni kwangaphambili kuya ekusetshenzisweni kwemodeli.
- Ukusekelwa kokuhlanganiswa neDatha Version Control (DVC), i-Streamlit, i-Weights & Biases ne-Ray package.
- Izinto ezintsha ezakhelwe ngaphakathi: iSentenceRecognizer, Morphologizer, Lemmatizer,
- I-AttributeRuler ne-Transformer.
- I-API entsha yokwakha izinto zakho.
Ekugcineni, uma unesifiso sokwazi okwengeziwe ngakho yale nguqulo entsha noma mayelana ne-spaCy, ungabheka imininingwane Kulesi sixhumanisi esilandelayo.