IMozilla yazisa i-DeepSpeech 0.9 Injini yokuqonda iNtetho

NzuluSpeech1

Ukhupho lupapashiwe injini yokuqonda intetho I-DeepSpeech 0.9 iphuhliswe yiMozilla, ephumeza uyilo lwe ukuqaphela intetho yegama elifanayo elicetywe ngabaphandi be-Baidu.

Ukuphunyezwa Ibhalwe ngePython kusetyenziswa iqonga lokufunda ngoomatshini TensorFlow kwaye isasazwe phantsi kwelayisensi ye-MPL 2.0 yasimahla.

Malunga ne-DeepSpeech

I-DeepSpeech iqulathe iinkqubo ezisezantsi ezimbini: imodeli ye-acoustic kunye ne-decoder. Imodeli ye-acoustic isebenzisa ubuchule bokufunda komatshini obunzulu ukubala ukuba nokwenzeka ukuba abalinganiswa abathile babekhona kwisandi sogalelo.

Idekhowuda isebenzisa i-algorithm yokukhangela iray ukuguqula idatha enokwenzeka yeempawu zibe ngumboniso wokubhaliweyo. I-DeepSpeech ilula kakhulu kuneenkqubo zemveli ngelixa ibonelela ngomgangatho ophezulu wokuqonda kubukho bengxolo engaphandle.

Uphuhliso alusebenzisi iimodeli ze-acoustic zendabuko kunye nengqiqo yeefonim; Endaweni yoko, kusetyenziswa inkqubo yokufunda ephuculwe kakuhle ye-neural network-based based machine, esusa imfuno yokuphuhlisa amacandelo ahlukeneyo ukwenza imodeli eyahlukeneyo efana nengxolo, i-echo, kunye neempawu zentetho.

Ikhithi inikeza iimodeli eziqeqeshiweyo, iifayile zesandi zesampulu kunye nezixhobo zokuqaphela umgca womyalelo.

Imodeli egqityiweyo ibonelelwa ngesiNgesi nesiTshayina kuphela. Kwezinye iilwimi, unokufunda inkqubo ngokwakho ngokwemiyalelo eqhotyoshelweyo, usebenzisa idatha yezwi eqokelelwe yiprojekthi yeVoice Common.

Xa Imodeli esele yenziwe yolwimi lwesiNgesi enikezelwayo ukukhuphela isetyenziswa, Inqanaba leempazamo zokuqaphela kwi-DeepSpeech yi-7.06% xa ivandlakanywa kusetyenziswa i-LibriSpeech test suite.

Ukuthelekisa, izinga lempazamo yokuqatshelwa komntu liqikelelwa kwi-5,83%.

Kwimodeli ecetywayo, esona siphumo sokuqatshelwa sifezekiswa ngokurekhodwa okucocekileyo kwelizwi lendoda kunye nesandi saseMelika kwindawo engenangxolo yangaphandle.

Ngokutsho kombhali wethala leencwadi leVosk eliqhubekayo lokuqaphela intetho, izinto ezingeloncedo kwiseti yeLizwi eliQhelekileyo licala elinye lezinto zokuthetha (ubukhulu bamadoda aneminyaka engama-20 ukuya kwengama-30 ubudala kunye nokungabikho kwezinto ezinamazwi abasetyhini, abantwana kunye nabantu abadala), ukungabikho kokuguquguquka kwesigama (ukuphindwaphindwa kwamabinzana afanayo) kunye nokusasazwa kweerekhodi zeMP3 ezithanda ukugqwetheka.

Phakathi kweengxaki ze-DeepSpeech kukusebenza kakubi kunye nokusetyenziswa kwememori ephezulu kwi-decoder, kunye nezixhobo ezibalulekileyo zokuqeqesha imodeli (i-Mozilla isebenzisa inkqubo ene-8 Quadro RTX 6000 GPUs ene-24GB VRAM nganye).

Ukungalungi kwale ndlela kukuba ukufumana ukuqondwa komgangatho ophezulu kunye noqeqesho lwenethiwekhi ye-neural, injini ye-DeepSpeech ifuna isixa esikhulu sedatha I-heterogeneous echazwe kwiimeko zokwenyani ngamazwi ahlukeneyo kunye nobukho bengxolo yendalo.

Idatha enjalo iqokelelwa yiprojekthi ye-Common Voice eyenziwe kwi-Mozilla, enika idatha eqinisekisiweyo isethi kunye neeyure ze-1469 ngesiNgesi, i-692 ngesiJamani, i-554 ngesiFrentshi, iiyure ze-105 ngesiRashiya kunye neeyure ze-22 kwi-Ukrainian.

Xa kuqeqeshwa imodeli yokugqibela yesiNgesi ye-DeepSpeech, ukongeza kwi-Common Voice, idatha evela kwiiprojekthi ze-LibriSpeech, i-Fisher kunye ne-Switchboard ziyasetyenziswa, kunye neeyure ezimalunga ne-1700 zokurekhodwa kwenkqubo yerediyo.

Phakathi kotshintsho kwisebe elitsha, ukubakho kokunyanzelisa ubunzima bamagama kugqame ekhethiweyo ngexesha lenkqubo yokuguqula ikhowudi.

Kwakhona kugxininiswe kwinkxaso yeqonga le-Electron 9.2 kunye nokuphunyezwa okukhethiweyo kwendlela yokulinganisa i-Layer Norm (Layer Norm) xa uqeqesha i-neural network.

Khuphela kwaye ufumane

Ukusebenza kwanele ukusebenzisa injini kwi-LePotato, i-Raspberry Pi 3 kunye ne-Raspberry Pi 4 iibhodi, kunye ne-Google Pixel 2, i-Sony Xperia Z Premium kunye ne-Nokia 1.3 smartphones.

Iimodyuli ezisele zilungile zibonelelwa ukusebenzisa kwiPython, NodeJS, C++ kunye .NET ukudibanisa iimpawu zokuqondwa kwentetho kwiinkqubo zakho (abaphuhlisi bomntu wesithathu balungiselele ngokwahlukeneyo iimodyuli zeRust, Hamba kunye neV).


Shiya uluvo lwakho

Idilesi yakho ye email aziyi kupapashwa. ezidingekayo ziphawulwe *

*

*

  1. Uxanduva lwedatha: UMiguel Ángel Gatón
  2. Injongo yedatha: Ulawulo lwe-SPAM, ulawulo lwezimvo.
  3. Umthetho: Imvume yakho
  4. Unxibelelwano lwedatha: Idatha ayizukuhanjiswa kubantu besithathu ngaphandle koxanduva lomthetho.
  5. Ukugcinwa kweenkcukacha
  6. Amalungelo: Ngalo naliphi na ixesha unganciphisa, uphinde uphinde ucime ulwazi lwakho.