Mozilla ta gabatar da Injin Gano Magana na DeepSpeech 0.9

Sanarwa1

An buga ƙaddamarwa injin gane murya DeepSpeech 0.9 ne ya haɓaka ta Mozilla, wanda ke aiwatar da gine-ginen magana sanarwa na wannan sunan da masu binciken Baidu suka gabatar.

Aiwatarwa an rubuta shi a cikin Python ta amfani da dandalin koyon inji TensorFlow kuma an rarraba shi a ƙarƙashin lasisin MPL 2.0 na kyauta.

Game da DeepSpeech

DeepSpeech ya ƙunshi ƙananan tsarin biyu: samfurin ado da kwalliya. Samfurin mai amfani da fasahar zamani yana amfani da dabarun koyon inji mai zurfin gaske don yin lissafin yuwuwar cewa wasu haruffa suna cikin sautin shigar da bayanai.

Mai rikodin halitta yana amfani da algorithm na binciken ray don canza bayanan yiwuwar halayyar mutum zuwa wakilcin rubutu. DeepSpeech ya fi tsarin gargajiya sauƙi kuma a lokaci guda yana ba da ƙimar inganci mafi girma a gaban surutu na waje.

Ci gaban ba ya amfani da samfuran gargajiyar gargajiya da kuma tunanin sautunan sauti; a maimakon haka, ana amfani da tsarin ilmantarwa na hanyar sadarwa mai amfani da tsarin hanyar sadarwa, wanda ke kawar da bukatar samarda bangarori daban daban don yin kwatankwacin rikice-rikice daban-daban kamar hayaniya, amsa kuwwa, da halayyar magana.

Kit ɗin yana ba da samfuran da aka horar, samfurin fayilolin sauti da kayan aikin layin umarni.

Ana kawo samfurin ƙirar don Ingilishi da Sinanci kawai. Don wasu yarukan, zaku iya koyon tsarin da kanku gwargwadon umarnin da aka haɗe, ta amfani da bayanan murya da aikin Muryar gama gari ya tattara.

Lokacin ana amfani da samfurin-shirye-don-amfani na harshen Ingilishi da aka bayar don saukarwa, matakin kuskuren ganewa a cikin DeepSpeech shine 7.06% lokacin da aka kimanta ta amfani da ɗakin gwajin LibriSpeech.

Don kwatantawa, an kiyasta kuskuren kuskuren ɗan adam zuwa 5,83%.

A cikin samfurin da aka gabatar, ana samun mafi kyawun sakamakon fitarwa tare da rikodin tsaftatacciyar muryar namiji tare da lafazin Ba'amurke a cikin wani yanayi ba tare da surutai na daban ba.

A cewar marubucin Vosk na ci gaba da Magana game da Magana Jawabin karatu, rashin dacewar sautin murya ɗaya shine gefe ɗaya na kayan magana (yawancin maza a cikin shekarun 20 zuwa 30 da kuma rashin kayan aiki tare da muryar mata, yara da tsofaffi), rashin bambancin kalmomin (maimaita kalmomin guda ɗaya) da rarraba rikodin MP3 da ke fuskantar ɓarna.

Rashin dacewar DeepSpeech sun hada da rashin kyau da kuma yawan amfani da ƙwaƙwalwar ajiya a cikin dikodi mai, da kuma mahimman albarkatu don horar da ƙirar (Mozilla tana amfani da tsarin tare da 8 Quadro RTX 6000 GPUs tare da 24GB VRAM a cikin kowannensu).

Rashin nasara ga wannan tsarin shine don fitarwa mai inganci da horo na hanyar sadarwa, Injin DeepSpeech na buƙatar adadi mai yawa bambancin yanayi da aka ba da umurni a cikin ainihin yanayi ta muryoyi daban-daban kuma a gaban amo na yanayi.

Wannan bayanan an tattara su ne ta hanyar aikin Murya na Common Voice da aka kirkira a Mozilla, wanda ke ba da tabbataccen bayanan da aka saita tare da awanni 1469 a Turanci, 692 a Jamusanci, 554 a Faransanci, awanni 105 a cikin Rashanci da awanni 22 a cikin Yukren.

Lokacin da ake horar da samfurin Ingilishi na ƙarshe don DeepSpeech, ban da Murya ta gama gari, ana amfani da bayanai daga ayyukan LibriSpeech, Fisher da Switchboard, da kuma kusan awanni 1700 na rikodin shirye-shiryen rediyo da aka watsa.

Tsakanin canje-canje a cikin sabon reshe, an nuna yiwuwar tilasta nauyin kalmomin zaba a yayin tsarin dikodi mai.

Hakanan yana nuna goyan baya ga dandalin Electron 9.2 da aiwatar da zaɓi na tsarin daidaiton Layer (Layer Norm) lokacin horon cibiyar sadarwar.

Zazzage kuma samu

Aikin ya isa yin amfani da motar a cikin allon LePotato, Rasberi Pi 3 da Rasberi Pi 4, da kuma a cikin Google Pixel 2, Sony Xperia Z Premium da wayoyin salula na Nokia 1.3.

Ana ba da kayayyaki masu shiri don amfani da Python, NodeJS, C ++, da .NET don haɗa ayyukan gane magana a cikin shirye-shiryenku (masu haɓaka ɓangare na uku sun shirya kayan aiki daban don Rust, Go, da V).


Bar tsokaci

Your email address ba za a buga. Bukata filayen suna alama da *

*

*

  1. Wanda ke da alhakin bayanan: Miguel Ángel Gatón
  2. Manufar bayanan: Sarrafa SPAM, sarrafa sharhi.
  3. Halacci: Yarda da yarda
  4. Sadarwar bayanan: Ba za a sanar da wasu bayanan ga wasu kamfanoni ba sai ta hanyar wajibcin doka.
  5. Ajiye bayanai: Bayanin yanar gizo wanda Occentus Networks (EU) suka dauki nauyi
  6. Hakkoki: A kowane lokaci zaka iyakance, dawo da share bayanan ka.