Bakhiphe ikhodi yomthombo ye-Whisper, isistimu ezenzakalelayo yokuqaphela inkulumo

Khuluma

I-Whisper iwuhlelo oluzenzakalelayo lokuqaphela inkulumo

Iphrojekthi muva nje I-OpenAI, ethuthukisa amaphrojekthi omphakathi emkhakheni wobuhlakani bokwenziwa, ushicilele izindaba okuhlobene nesistimu yokuqaphela izwi hleba, okuyi-a isistimu yokuqaphela inkulumo ezenzakalelayo (ASR) baqeqeshwe amahora angu-680.000 edatha egadiwe yezilimi eziningi, imisebenzi eminingi eqoqwe kuwebhu.

Kuthiwa ngenkulumo yesiNgisi, uhlelo luhlinzeka ngamazinga okuthembeka nokunemba kokuqashelwa okuzenzakalelayo eduze nokuqashelwa komuntu.

Sibonisa ukuthi ukusebenzisa idathasethi enkulu nehlukahlukene kangaka kuholela ekuqineni okukhulu kokuphimisela, umsindo ongemuva, nolimi lobuchwepheshe. Ngaphezu kwalokho, ivumela ukubhalwa ngezilimi ezihlukahlukene, kanye nokuhunyushwa kwalezo zilimi ngesiNgisi. Singamamodeli omthombo ovulekile kanye nekhodi yokukhomba esebenza njengesisekelo sokwakha izinhlelo zokusebenza eziwusizo kanye nocwaningo lwesikhathi esizayo mayelana nokucubungula inkulumo okuqinile.

Mayelana nemodeli (njengoba sekushiwo) baqeqeshwe kusetshenziswa amahora angu-680 yedatha yezwi eqoqwe emaqoqweni ahlukahlukene ahlanganisa izilimi ezahlukene nezihloko. Cishe i-1/3 yedatha yezwi ehilelekile ekuqeqesheni ingezinye izilimi ngaphandle kwesiNgisi.

Uhlelo oluhlongozwayo iphatha ngendlela efanele izimo ezifana nokuphimisa amagama, ukuba khona komsindo wangemuva kanye nokusetshenziswa kwejagoni yezobuchwepheshe. Ngokungeziwe ekuguquleleni inkulumo ibe umbhalo, isistimu ingaphinda ihumushe inkulumo isuka olimini olungadingekile iye esiNgisini futhi ibone ukubukeka kwenkulumo ekusakazeni komsindo.

Amamodeli aqeqeshelwa izethulo ezimbili: imodeli yolimi lwesiNgisi kanye nemodeli yezilimi eziningi esekela iSpanishi, isiRashiya, isiNtaliyane, isiJalimane, isiJapane, isi-Ukrainian, isiBelarusian, isiShayina, nezinye izilimi. Ngokulandelayo, ukubuka ngakunye kuhlukaniswe ngezinketho ezi-5, ezihlukile ngosayizi kanye nenani lamapharamitha ambozwe kumodeli.

I-Whisper architecture iyindlela elula yokuphela ukuya ekupheleni, esetshenziswa njengesiguquli sekhodi yokufaka ikhodi. Umsindo wokufakwayo uhlukaniswa ube izingcezu zamasekhondi angu-30, uguqulelwe ku-spectrogram ye-log-Mel, bese udluliselwa kusishumeki. Idekhoda iqeqeshelwe ukubikezela umbhalo ongezansi ohambisanayo, ohlanganiswe namathokheni akhethekile aqondisa imodeli ehlukile yokwenza imisebenzi efana nokuhlonza ulimi, izitembu zesikhathi zeleveli yomusho, ukulotshwa kwenkulumo ngezilimi eziningi, nokuhumusha kwenkulumo yesiNgisi.

Uma usayizi mkhulu, uphakeme ukunemba kokuqashelwa kanye nekhwalithi, kodwa futhi ziba phezulu nezimfuneko zosayizi wememori yevidiyo ye-GPU kanye nokusebenza okuphansi. Isibonelo, inketho encane ihlanganisa amapharamitha ayizigidi ezingu-39 futhi idinga i-1 GB yememori yevidiyo, kuyilapho inketho ephezulu ihlanganisa amapharamitha ayizigidi eziyizinkulungwane ezingu-1550 futhi idinga inkumbulo yevidiyo engu-10 GB. Okuhlukile okuncane kushesha izikhathi ezingama-32 kunobukhulu.

Uhlelo lusebenzisa i-“Transformer” neural network architecture, okufaka phakathi isifaki khodi nesikhikhoda ezisebenzisanayo. Umsindo uhlukaniswa ube izingcezu zamasekhondi angu-30, eziguqulelwa ku-spectrogram ye-log-Mel bese ithunyelwa kusishumeki.

Umphumela womsebenzi womfaki khodi uthunyelwa kusikhiphi khodi, ebikezela ukumelwa kombhalo okuxutshwe namathokheni akhethekile avumela ukuxazulula imisebenzi efana nokutholwa kolimi, ukubalwa kokuphinyiselwa kokuphinyiselwa kwesikhathi komusho, ukulotshwa kwenkulumo ngezilimi ezahlukahlukene nokuhumusha kwesiNgisi ngemodeli evamile.

Kuhle ukusho ukuthi ukusebenza kwe-Whisper kuyehluka kakhulu kuye ngolimi, ngakho-ke leyo eveza ukuqonda kangcono isiNgisi, esinezinguqulo ezine kuphela ngesiNgisi, okuthi, njengamanye amamodeli wezinye izilimi, zinikeze izinzuzo kanye nokubi. isivinini nokunemba.

Okokugcina Uma unesifiso sokwazi okwengeziwe ngakho, ungabheka ukushicilelwa koqobo ku lesi sixhumanisi, kuyilapho uma unentshisekelo kukhodi yomthombo namamodeli aqeqeshiwe ongabonisana nawo kuwo lesi sixhumanisi

Ikhodi yokusebenzisa ireferensi esekelwe kuhlaka lwe-PyTorch kanye nesethi yamamodeli aseqeqeshiwe avuliwe, alungele ukusetshenziswa. Ikhodi ingumthombo ovulekile ngaphansi kwelayisensi ye-MIT futhi kufanelekile ukusho ukuthi ukusetshenziswa komtapo wezincwadi we-ffmpeg kuyadingeka.


Okuqukethwe yi-athikili kunamathela ezimisweni zethu ze izimiso zokuhlelela. Ukubika iphutha chofoza lapha.

Yiba ngowokuqala ukuphawula

Shiya umbono wakho

Ikheli lakho le ngeke ishicilelwe.

*

*

  1. Ubhekele imininingwane: Miguel Ángel Gatón
  2. Inhloso yedatha: Lawula Ugaxekile, ukuphathwa kwamazwana.
  3. Ukusemthethweni: Imvume yakho
  4. Ukuxhumana kwemininingwane: Imininingwane ngeke idluliselwe kubantu besithathu ngaphandle kwesibopho esisemthethweni.
  5. Isitoreji sedatha: Idatabase ebanjwe yi-Occentus Networks (EU)
  6. Amalungelo: Nganoma yisiphi isikhathi ungakhawulela, uthole futhi ususe imininingwane yakho.