Iwo anatulutsa kachidindo kochokera kwa Whisper, makina odziŵika bwino mawu

Wong'oneza

Whisper ndi makina ozindikira mawu

Ntchito posachedwapa OpenAI, yomwe imapanga ntchito zapagulu pazanzeru zopangapanga, wafalitsa nkhani zokhudzana ndi dongosolo lozindikira mawu kunong'oneza, amene ndi Makina ozindikira mawu odziwikiratu (ASR) ophunzitsidwa pa maola 680.000 a zilankhulo zambiri, zochita zambiri zoyang'aniridwa zomwe zasonkhanitsidwa kuchokera pa intaneti.

Akuti pamalankhulidwe a Chingerezi, makinawa amapereka milingo yodziwikiratu yodalirika komanso yolondola pafupi ndi kuzindikira kwamunthu.

Tikuwonetsa kuti kugwiritsa ntchito mitundu yayikulu komanso yosiyanasiyana yotere kumabweretsa kulimba kwa katchulidwe ka mawu, phokoso lakumbuyo, ndi chilankhulo chaukadaulo. Kuphatikiza apo, imalola kulembedwa m'zilankhulo zosiyanasiyana, komanso kumasulira kwa zilankhulozo mu Chingerezi. Ndife zitsanzo zotseguka ndi ma codec omwe amakhala ngati maziko opangira ntchito zothandiza komanso kafukufuku wamtsogolo wokhudza kukonza kwamawu mwamphamvu.

Zachitsanzo (monga tafotokozera kale) ophunzitsidwa kugwiritsa ntchito maola 680 za mawu omwe amasonkhanitsidwa kuchokera m'magulu osiyanasiyana okhudza zilankhulo zosiyanasiyana ndi mitu yawo. Pafupifupi 1/3 yamawu omwe amakhudzidwa ndi maphunziro ali m'zilankhulo zina osati Chingerezi.

Dongosolo lofunsidwa amasamalira bwino zochitika monga katchulidwe ka mawu, kukhalapo kwa phokoso lakumbuyo komanso kugwiritsa ntchito mawu aukadaulo. Kuphatikiza pa kumasulira mawu m'mawu, makina amathanso kumasulira mawu kuchokera kuchilankhulo chokhazikika kupita ku Chingerezi ndikuwona mawonekedwe akulankhula mumayendedwe amawu.

Zitsanzo zimaphunzitsidwa muzithunzi ziwiri: chitsanzo cha chinenero cha Chingerezi ndi chitsanzo cha zinenero zambiri chomwe chimathandiza Chisipanishi, Chirasha, Chitaliyana, Chijeremani, Chijapani, Chiyukireniya, Chibelarusi, Chitchaina, ndi zilankhulo zina. Momwemonso, malingaliro aliwonse amagawidwa muzosankha za 5, zomwe zimasiyana mu kukula ndi kuchuluka kwa magawo omwe ali mu chitsanzo.

Zomangamanga za Whisper ndi njira yosavuta yofikira kumapeto, yokhazikitsidwa ngati chosinthira chotsitsa-decoder. Mawu olowera amagawidwa m'magawo a 30-sekondi, kusinthidwa kukhala log-Mel spectrogram, kenako ndikuperekedwa kwa encoder. Decoder imaphunzitsidwa kulosera mawu ang'onoang'ono, ophatikizika ndi zizindikiro zapadera zomwe zimawongolera mtundu wapadera kuti ugwire ntchito monga kuzindikira chilankhulo, masitampu amilingo ya ziganizo, kumasulira mawu m'zilankhulo zambiri, ndi kumasulira mawu mu Chingerezi.

Kukula kwakukulu, kumapangitsa kuzindikira kulondola ndi khalidwe, komanso kukwezera zofunikira za kukula kwa kukumbukira mavidiyo a GPU ndi kuchepetsa ntchito. Mwachitsanzo, njira yochepa imaphatikizapo magawo 39 miliyoni ndipo imafuna 1 GB ya kukumbukira mavidiyo, pamene njira yowonjezera imaphatikizapo magawo 1550 biliyoni ndipo imafuna 10 GB ya kukumbukira mavidiyo. Kusiyana kochepa ndi nthawi 32 mofulumira kuposa pazipita.

Dongosololi limagwiritsa ntchito zomangamanga za "Transformer" neural network, zomwe zimaphatikizapo encoder ndi decoder zomwe zimalumikizana wina ndi mnzake. Zomvera zimagawika m'magulu a masekondi 30, omwe amasinthidwa kukhala log-Mel spectrogram ndikutumizidwa ku encoder.

Zotsatira za ntchito ya encoder zimatumizidwa ku decoder, yomwe imalosera zoyimira zosakanikirana ndi zizindikiro zapadera zomwe zimalola kuthetsa ntchito monga kuzindikira chilankhulo, katchulidwe katchulidwe ka ziganizo, kumasulira kwamawu m'zilankhulo zosiyanasiyana ndi kumasulira kwachingerezi munjira zambiri.

Ndikoyenera kutchula kuti machitidwe a Whisper amasiyana kwambiri malinga ndi chinenerocho, kotero chomwe chimapereka kumvetsetsa bwino ndi Chingerezi, chomwe chili ndi matembenuzidwe anayi okha mu Chingerezi, zomwe, monga zitsanzo zina za zinenero zina, zimapereka ubwino ndi zovuta za liwiro ndi kulondola.

Mapeto Ngati mukufuna kudziwa zambiri za izi, mutha kuyang'ana kutulutsa koyamba mu kugwirizana, pamene muli ndi chidwi ndi code code ndi zitsanzo ophunzitsidwa mukhoza kuwafunsa cholumikizachi

Khodi yogwiritsira ntchito maumboni kutengera dongosolo la PyTorch ndi mitundu yophunzitsidwa kale ndi yotseguka, yokonzeka kugwiritsidwa ntchito. Khodiyo ndi gwero lotseguka pansi pa layisensi ya MIT ndipo ndiyenera kunena kuti kugwiritsa ntchito laibulale ya ffmpeg ndikofunikira.


Zomwe zili m'nkhaniyi zikutsatira mfundo zathu za malamulo okonzekera. Kuti mufotokoze cholakwika dinani Apa.

Khalani oyamba kuyankha

Siyani ndemanga yanu

Anu email sati lofalitsidwa.

*

*

  1. Wotsogolera pazosankhazi: Miguel Ángel Gatón
  2. Cholinga cha deta: Control SPAM, kasamalidwe ka ndemanga.
  3. Kukhazikitsa: Kuvomereza kwanu
  4. Kulumikizana kwa zomwe zafotokozedwazo: Zomwezo siziziwululidwa kwa anthu ena kupatula pakukakamizidwa mwalamulo.
  5. Zosunga: Zosungidwa ndi Occentus Networks (EU)
  6. Ufulu: Nthawi iliyonse mutha kuchepetsa, kuchira ndikuchotsa zidziwitso zanu.