Bakhuphe ikhowudi yemvelaphi yeWhisper, inkqubo yokuqaphela intetho ezenzekelayo

Ukuhleka

Ukusebeza yinkqubo yokuqaphela intetho ezenzekelayo

Iprojekthi kutshanje I-OpenAI, ephuhlisa iiprojekthi zoluntu kwinkalo yobukrelekrele bokwenziwa, upapashe iindaba enxulumene nenkqubo yokuqonda ilizwi sebeza, eyi a inkqubo yokuqaphela intetho ezenzekelayo (ASR) baqeqeshwe kwiiyure ezingama-680.000 zedatha yeelwimi ezininzi, imisebenzi emininzi egadiweyo eqokelelwe kwiwebhu.

Kufunwa ukuba kwintetho yesiNgesi, inkqubo ibonelela ngamanqanaba okuthembeka okuzenzekelayo kunye nokuchaneka kufutshane nokuqatshelwa komntu.

Sibonisa ukuba ukusebenzisa idataset enkulu neyahlukeneyo kukhokelela ekomelezeni okukhulu kwezandi, ingxolo yangasemva, kunye nolwimi lobugcisa. Ukongeza, ivumela ukukhutshelwa kwiilwimi ezahlukeneyo, kunye nokuguqulelwa kwezo lwimi kwisiNgesi. Siyimodeli yemithombo evulekileyo kunye nekhowudi yokuqonda esebenza njengesiseko sokwakha izicelo eziluncedo kunye nophando lwexesha elizayo malunga nokulungiswa kwentetho eyomeleleyo.

Malunga nemodeli (njengoko sele kukhankanyiwe) baqeqeshwe kusetyenziswa iiyure ezingama-680 yedatha yelizwi eqokelelwe kwiiqoqo ezahlukeneyo ezigubungela iilwimi ezahlukeneyo kunye nezifundo. Malunga ne-1/3 yedatha yelizwi ebandakanyekayo kuqeqesho ingezinye iilwimi ngaphandle kwesiNgesi.

Inkqubo ecetywayo izisingatha ngokuchanekileyo iimeko ezifana nokubiza amagama agxininisiweyo, ubukho bengxolo yangasemva kunye nokusetyenziswa kwejagoni yobugcisa. Ukongeza ekuguquleleni intetho kwisicatshulwa, inkqubo inokuguqulela intetho ukusuka kulwimi olungenasizathu ukuya kwisiNgesi kwaye ibone ukubonakala kwentetho kumsinga womsindo.

Iimodeli ziqeqeshwe kwimiboniso emibini: imodeli yolwimi lwesiNgesi kunye nemodeli yeelwimi ezininzi exhasa iSpanish, isiRashiya, isiTaliyane, isiJamani, isiJapan, isiUkrainian, isiBelarusian, isiTshayina, kunye nezinye iilwimi. Ngaloo ndlela, umbono ngamnye uhlukaniswe kwiinketho ezi-5, ezahlukileyo ngobukhulu kunye nenani leeparamitha ezigutyungelwe kwimodeli.

Uyilo lwe-Whisper yindlela elula yokuphela ukuya ekupheleni, ephunyezwe njenge-encoder-decoder transformer. I-audio yegalelo yahlulahlulwe ibe yi-30-yesibini chunks, iguqulelwe kwi-log-Mel spectrogram, kwaye emva koko idluliselwe kwi-encoder. Idikhowuda iqeqeshelwe ukuqikelela isihlokwana sokubhaliweyo esihambelanayo, esidityaniswe nemiqondiso ekhethekileyo eyalathisa imodeli eyodwa ukwenza imisebenzi efana nokuchongwa kolwimi, izitampu zenqanaba lezivakalisi, ushicilelo lwentetho ngeelwimi ezininzi, kunye nokuguqulelwa kwentetho kwisiNgesi.

Ubukhulu besayizi, buphezulu ukuchaneka kunye nomgangatho, kodwa kunye neemfuneko eziphezulu zememori yevidiyo ye-GPU kunye nokusebenza okuphantsi. Ngokomzekelo, ukhetho oluncinci lubandakanya i-39 yezigidi zeeparameters kwaye ludinga i-1 GB yememori yevidiyo, ngelixa ukhetho oluphezulu lubandakanya i-1550 yeebhiliyoni zeeparitha kwaye ludinga i-10 GB yememori yevidiyo. Ubuncinci obahlukileyo bukhawuleza amaxesha angama-32 kunowona mkhulu.

Inkqubo isebenzisa "iTransformer" ye-neural network architecture, equka i-encoder kunye ne-decoder esebenzisana enye kwenye. I-audio ihlulwe kwiinqununu ze-30-yesibini, eziguqulelwa kwi-log-Mel spectrogram kwaye ithunyelwe kwi-encoder.

Isiphumo somsebenzi we-encoder sithunyelwa kwidikhowuda, eqikelela umboniso wokubhaliweyo oxutywe neempawu ezikhethekileyo ezivumela ukusombulula imisebenzi efana nokubhaqwa kolwimi, ukubizwa kwezivakalisi ngokulandelelana kweziganeko, ushicilelo lwentetho ngeelwimi ezahlukeneyo kunye nokuguqulelwa kwesiNgesi kwimodeli eqhelekileyo.

Kufanelekile ukukhankanya ukuba ukusebenza kwe-Whisper kuyahluka kakhulu ngokuxhomekeke kulwimi, ngoko ke lowo obonisa ukuqonda okungcono sisiNgesi, esineenguqulelo ezine kuphela ngesiNgesi, ezithi, njengezinye iimodeli zezinye iilwimi, zibonelela ngeenzuzo kunye nokungalungi. isantya kunye nokuchaneka.

Gqibela Ukuba unomdla wokwazi okungakumbi ngayo, Ungajonga upapasho lwantlandlolo kwi esi sixhobo, ngelixa ukuba unomdla kwikhowudi yomthombo kunye neemodeli eziqeqeshiweyo onokubonisana nazo eli khonkco.

Ikhowudi yokuphumeza ireferensi esekwe kwisakhelo sePyTorch kunye neseti yeemodeli esele ziqeqeshiwe zivuliwe, zilungele ukusetyenziswa. Ikhowudi ngumthombo ovulekileyo phantsi kwelayisenisi ye-MIT kwaye kuyafaneleka ukukhankanya ukuba ukusetyenziswa kwelayibrari ye-ffmpeg kuyadingeka.


Shiya uluvo lwakho

Idilesi yakho ye email aziyi kupapashwa. ezidingekayo ziphawulwe *

*

*

  1. Uxanduva lwedatha: UMiguel Ángel Gatón
  2. Injongo yedatha: Ulawulo lwe-SPAM, ulawulo lwezimvo.
  3. Umthetho: Imvume yakho
  4. Unxibelelwano lwedatha: Idatha ayizukuhanjiswa kubantu besithathu ngaphandle koxanduva lomthetho.
  5. Ukugcinwa kweenkcukacha
  6. Amalungelo: Ngalo naliphi na ixesha unganciphisa, uphinde uphinde ucime ulwazi lwakho.