I-NLLB, i-Facebook AI yokuhumusha umbhalo oqondile

Muva nje Facebook yethulwe ngokushicilelwa kwentuthuko ye Iphrojekthi ye-NLLB (Alukho Ulimi Olusele Ngemuva), inhloso yalo kuwukwakha imodeli yokufunda yomshini yendawo yonke yokuhumusha umbhalo oqondile osuka kolunye ulimi uye kolunye, udlule ukuhumusha okuphakathi kuya esiNgisini.

Imodeli ehlongozwayo ihlanganisa izilimi ezingaphezu kuka-200, okuhlanganisa nezilimi ezingavamile zase-Afrika nase-Australia futhi umgomo omkhulu wephrojekthi ukunikeza indlela yokuxhumana yabo bonke abantu, kungakhathaliseki ukuthi bakhuluma luphi ulimi.

Ukusiza abantu ukuthi baxhumane kangcono namuhla futhi babe yingxenye ye-metaverse yakusasa, abacwaningi be-Meta AI badale No Language Left Behind (NLLB), umzamo wokuthuthukisa amakhono omshini wokuhumusha wekhwalithi ephezulu ezilimini eziningi zomhlaba. .

Namuhla simemezela inqubekelaphambili enkulu ku-NLLB: sidale imodeli ye-AI eyodwa ebizwa nge-NLLB-200., ehumusha izilimi ezihlukene ezingama-200 ngemiphumela ephambili. Eziningi zalezi zilimi, njengesiKamba nesiLao, zazingasekelwa ngisho namathuluzi okuhumusha angcono kakhulu atholakalayo namuhla.

Mayelana nephrojekthi kukhulunywa ngayo okuhloswe ngakho ukwenza lula ukudalwa kwamaphrojekthi kusetshenziswa imodeli ehlongozwayo, ikhodi yohlelo lokusebenza esetshenziselwa ukuhlola nokuhlola ikhwalithi yamamodeli (FLORES-200, NLLB-MD, Toxicity-200), ikhodi yokuqeqeshwa eyimodeli nezishumeki ezisuselwe kulabhulali ye-LASER3 (I-Agnostic Software Representation of the diom). Imodeli yokugcina inikezwa ngezinguqulo ezimbili: egcwele futhi encishisiwe. Inguqulo encishisiwe idinga izinsiza ezimbalwa futhi ifanele ukuhlolwa nokusetshenziswa kumaphrojekthi ocwaningo.

Izilimi zase-Afrika ezingaphansi kuka-25 okwamanje zisekelwa amathuluzi okuhumusha asetshenziswa kakhulu, amaningi awo awekhwalithi ephansi. Ngokuphambene, i-NLLB-200 isekela izilimi zase-Afrika ezingama-55 ngokuphuma kwekhwalithi ephezulu. Sekukonke, le modeli eyingqayizivele inganikeza ukuhunyushwa kwekhwalithi ephezulu kwezilimi ezikhulunywa izigidigidi zabantu emhlabeni jikelele. Sekukonke, amaphuzu e-NLLB-200 BLEU athuthuka kusimo sangaphambilini sobuciko ngesilinganiso samaphesenti angu-44 kuzo zonke izikhombisi-ndlela ezingu-10k zebhentshimakhi ye-FLORES-101. Kwezinye izilimi zase-Afrika naseNdiya, ukwanda kungaphezu kwamaphesenti angu-70 kunezinhlelo zokuhumusha zakamuva.

Ngokungafani nezinye izinhlelo zokuhumusha zokufunda ngomshini, Isixazululo se-Facebook sigqama ngokunikeza imodeli efanayo yazo zonke izilimi ezingama-200, ehlanganisa zonke izilimi futhi engadingi amamodeli ahlukene olimi ngalunye.

Ukuhumusha kwenziwa ngokuqondile kusuka olimini oluwumthombo kuya olimini okuhunyushelwa kulo, ngaphandle kokuhunyushelwa kwesiNgisi okumaphakathi. Ukuze kudalwe izinhlelo zokuhumusha zomhlaba wonke, kuhlongozwa imodeli eyengeziwe ye-LID (Language IDentification), evumela ukunquma ulimi olusetshenzisiwe. Labo. isistimu ingakwazi ukubona ngokuzenzakalelayo ulimi ulwazi olunikezwa ngalo futhi iluhumushele olimini lomsebenzisi.

Ukuhumusha kusekelwa kunoma iyiphi indlela, phakathi kwanoma yiziphi izilimi ezingu-200 ezisekelwe. Ukuze kuqinisekiswe ikhwalithi yokuhumusha phakathi kwanoma yiluphi ulimi, isethi yokuhlola ibhentshimakhi ye-FLORES-200 yalungiswa, eyabonisa ukuthi imodeli ye-NLLB-200, ngokwekhwalithi yokuhumusha, ngokwesilinganiso iphakeme ngo-44% kunezinhlelo ze-FLORES-70. Ngaphambilini ucwaningo oluhlongozwayo olususelwe ekufundeni komshini lapho kusetshenziswa amamethrikhi e-BLEU aqhathanisa ukuhumusha komshini nokuhumusha komuntu okujwayelekile. Ezilwimini zase-Afrika ezingavamile kanye nezilimi zesigodi zaseNdiya, ukuphakama kwekhwalithi kufinyelela ku-XNUMX%. Ungakwazi ukuhlola ngokubuka ikhwalithi yokuhumusha kusayithi ledemo elilungiselelwe ngokukhethekile.

Kulabo abathanda iphrojekthi, kufanele bazi ukuthi imodeli itholakala ngaphansi kwelayisensi ye-Creative Commons BY-NC 4.0, evumela ukukopishwa, ukusatshalaliswa, ukufakwa kumaphrojekthi akho, nokudalwa kwemisebenzi ephuma kokunye, kodwa kuncike esibalweni, ukugcinwa kwelayisense, nokusetshenziselwa izinjongo ezingezona ezokuthengisa kuphela. Ithuluzi lokumodela linikezwe ilayisense ngaphansi kwelayisense ye-MIT. Ukuze kugqugquzelwe intuthuko kusetshenziswa imodeli ye-NLLB, kunqunywe ukuthi kwabiwe u-$200 ukuze kunikezwe abacwaningi imifundaze.

Okokugcina uma unentshisekelo yokwazi kabanzi ngakho mayelana nenothi, ungabheka kokuthunyelwe kwangempela Kulesi sixhumanisi esilandelayo.


Shiya umbono wakho

Ikheli lakho le ngeke ishicilelwe. Ezidingekayo ibhalwe nge *

*

*

  1. Ubhekele imininingwane: Miguel Ángel Gatón
  2. Inhloso yedatha: Lawula Ugaxekile, ukuphathwa kwamazwana.
  3. Ukusemthethweni: Imvume yakho
  4. Ukuxhumana kwemininingwane: Imininingwane ngeke idluliselwe kubantu besithathu ngaphandle kwesibopho esisemthethweni.
  5. Isitoreji sedatha: Idatabase ebanjwe yi-Occentus Networks (EU)
  6. Amalungelo: Nganoma yisiphi isikhathi ungakhawulela, uthole futhi ususe imininingwane yakho.