IMozilla Yethula Injini Yokuqonda Inkulumo ye-DeepSpeech 0.9

I-DeepSpeech1

Ukwethulwa kushicilelwe injini yokuqaphela izwi I-DeepSpeech 0.9 ithuthukiswe yiMozilla, osebenzisa ukwakhiwa kwe- ukuqashelwa kwezwi wegama elifanayo eliphakanyiswe ngabaphenyi beBaidu.

Ukuqaliswa kubhalwe ePython kusetshenziswa ipulatifomu yokufunda imishini I-TensorFlow futhi isatshalaliswa ngaphansi kwelayisensi yamahhala ye-MPL 2.0.

Mayelana ne-DeepSpeech

I-DeepSpeech inezinhlelo ezimbili ezingaphansi: imodeli ye-acoustic ne-decoder. Imodeli ye-acoustic isebenzisa amasu wokufunda womshini ojulile ukubala ukuthi kungenzeka yini ukuthi izinhlamvu ezithile zikhona kumsindo wokufaka.

I-decoder isebenzisa i-algorithm yosesho lwe-ray ukuguqula idatha yamathuba wezinhlamvu ibe ukumelwa kombhalo. I-DeepSpeech ilula kakhulu kunezinhlelo zendabuko futhi ngasikhathi sinye inikeza ikhwalithi ephezulu yokuqashelwa phambi komsindo ongaphandle.

Intuthuko ayisebenzisi amamodeli wendabuko we-acoustic kanye nomqondo wama-phonemes; esikhundleni salokho, kusetshenziswa uhlelo lokufunda ngomshini olwenziwe kahle lwenethiwekhi olusebenza kahle, oluqeda isidingo sokwakha izinto ezihlukile zokumodela okungafani okufana nomsindo, i-echo, nezimpawu zokukhuluma.

Ikhithi inikeza amamodeli aqeqeshiwe, amafayela wesampula womsindo namathuluzi wokuqaphela umugqa womyalo.

Imodeli eqediwe inikezwa ngesiNgisi nangesiShayina kuphela. Kwezinye izilimi, ungazifundela uhlelo uqobo lwakho ngokuya ngemiyalo enamathiselwe, usebenzisa idatha yezwi eqoqwe iphrojekthi ye-Common Voice.

Nini imodeli elungele ukusetshenziswa yolimi lwesiNgisi olunikelwe ukulanda iyasetshenziswa, izinga lamaphutha wokubonwa ku-DeepSpeech lingu-7.06% lapho lihlolwe kusetshenziswa i-suite yokuhlola ye-LibriSpeech.

Ukuqhathanisa, isilinganiso sephutha lokuqashelwa komuntu lilinganiselwa ku-5,83%.

Kumodeli ehlongozwayo, umphumela omuhle kakhulu wokuqashelwa utholakala ngokurekhodwa okuhlanzekile kwezwi lowesilisa elinamazwi aseMelika endaweni engenayo imisindo engaphandle.

Ngokusho kombhali we-Vosk Continuous Speech Recognition Library, ukungalungi kwe-Common Voice set kungukubheka uhlangothi lwezinto zokukhuluma (ubukhulu bamadoda aneminyaka engama-20 nengama-30 kanye nokuntuleka kwempahla enezwi labesifazane, izingane kanye asebekhulile), ukuntuleka kokuhlukahluka kwesilulumagama (ukuphindaphindwa kwemishwana efanayo) nokusatshalaliswa kwamarekhodi e-MP3 athambekele ekuhlanekezelweni.

Ukungalungi kwe-DeepSpeech kufaka ukusebenza okungalungile nokusetshenziswa kwememori ephezulu ku-decoder, kanye nezinsizakusebenza ezibalulekile zokuqeqesha imodeli (IMozilla isebenzisa uhlelo olunama-8 Quadro RTX 6000 GPU ane-24GB VRAM kulowo nalowo).

Okubi ngale ndlela ukuthi ukuqashelwa kwekhwalithi ephezulu nokuqeqeshwa kwenethiwekhi ye-neural, injini ye-DeepSpeech kudinga inani elikhulu lemininingwane ukuhlukumeza kubizelwe ezimeni zangempela ngamaphimbo ahlukene naphambi kwemisindo yemvelo.

Le mininingwane ihlanganiswe iphrojekthi ye-Common Voice eyenziwe eMozilla, enikezela ngemininingwane eqinisekisiwe enamahora ayi-1469 ngesiNgisi, ama-692 ngesiJalimane, ama-554 ngesiFulentshi, amahora ayi-105 ngesiRashiya namahora angama-22 e-Ukraine.

Lapho uqeqesha imodeli yokugcina yesiNgisi ye-DeepSpeech, ngaphezu kwe-Common Voice, idatha evela kumaphrojekthi weLibriSpeech, Fisher ne-switchchboard iyasetshenziswa ngokwengeziwe, kanye namahora angaba ngu-1700 okuqoshwa kwezinhlelo zomsakazo ezibhaliwe

Phakathi kwezinguquko egatsheni elisha, kugcizelelwa ukuthi kungenzeka ukuphoqa isisindo samagama kukhethwe ngesikhathi senqubo yokuhumusha.

Iphinde iqhakambise ukwesekwa kwepulatifomu ye-Electron 9.2 kanye nokuqaliswa kokuzikhethela kwendlela yokujwayela ungqimba (Layer Norm) lapho kuqeqeshwa inethiwekhi ye-neural.

Landa bese uthola

Ukusebenza kwanele ukusebenzisa imoto kumabhodi weLePotato, Raspberry Pi 3 kanye ne-Raspberry Pi 4, kanye nakuGoogle Pixel 2, i-Sony Xperia Z Premium kanye nama-smartphones e-Nokia 1.3.

Amamojula alungile ahlinzekwa ukuze usebenzise i-Python, i-NodeJS, i-C ++, ne-.NET ukuhlanganisa imisebenzi yokuqashelwa kwenkulumo ezinhlelweni zakho (abathuthukisi bezinkampani zangaphandle balungiselele ngokwahlukana amamojuli weRust, Go, neV).


Okuqukethwe yi-athikili kunamathela ezimisweni zethu ze izimiso zokuhlelela. Ukubika iphutha chofoza lapha.

Yiba ngowokuqala ukuphawula

Shiya umbono wakho

Ikheli lakho le ngeke ishicilelwe. Ezidingekayo ibhalwe nge *

*

*

  1. Ubhekele imininingwane: Miguel Ángel Gatón
  2. Inhloso yedatha: Lawula Ugaxekile, ukuphathwa kwamazwana.
  3. Ukusemthethweni: Imvume yakho
  4. Ukuxhumana kwemininingwane: Imininingwane ngeke idluliselwe kubantu besithathu ngaphandle kwesibopho esisemthethweni.
  5. Isitoreji sedatha: Idatabase ebanjwe yi-Occentus Networks (EU)
  6. Amalungelo: Nganoma yisiphi isikhathi ungakhawulela, uthole futhi ususe imininingwane yakho.