I-NLLB, i-Facebook AI yokuguqulela umbhalo ngokuthe ngqo

Mva nje Facebook ityhilwe ngopapasho uphuhliso lwe Iprojekthi ye-NLLB (Alukho uLwimi oluNxele ngasemva), injongo yalo ikukudala imodeli yokufunda koomatshini jikelele ukuguqulelwa umbhalo othe ngqo ukusuka kolunye ulwimi ukuya kolunye, ugqitha uguqulelo oluphakathi ukuya kwisiNgesi.

Imodeli ecetywayo igubungela iilwimi ezingaphezu kwama-200, kuquka iilwimi ezinqabileyo zaseAfrika naseOstreliya kwaye eyona njongo iphambili yeprojekthi kukubonelela ngeendlela zonxibelelwano kubo bonke abantu, kungakhathaliseki ukuba bathetha luphi na ulwimi.

Ukunceda abantu baqhagamshele ngcono namhlanje kwaye babe yinxalenye yemetaverse yangomso, abaphandi beMeta AI benze Akukho lwimi luNxele ngasemva (NLLB), umzamo wokuphuhlisa isakhono sokuguqulela ngoomatshini bomgangatho ophezulu kuninzi lweelwimi zehlabathi.

Namhlanje sibhengeza inkqubela phambili enkulu kwi-NLLB: senze imodeli ye-AI enye ebizwa ngokuba yi-NLLB-200., Eguqulela iilwimi ezingama-200 ezahlukeneyo ngeziphumo ezibukhali. Uninzi lwezi lwimi, njengesiKamba nesiLao, zazingaxhaswa kwanezona zixhobo zibalaseleyo zokuguqulela ezikhoyo namhlanje.

Malunga neprojekthi kukhankanyiwe ukuba kunjalo kujongwe ukwenza lula ukuyilwa kweeprojekthi kusetyenziswa imodeli ecetywayo, ikhowudi yesicelo esetyenziselwa ukuvavanya nokuvavanya umgangatho weemodeli (FLORES-200, NLLB-MD, Toxicity-200), ikhowudi yoqeqesho lomzekelo kunye neekhowudi ezisekelwe kwilayibrari ye-LASER3 (i-Agnostic Software Representation ye-idiom). Imodeli yokugqibela inikezelwa kwiinguqulelo ezimbini: igcwele kwaye iyancipha. Uguqulelo oluncitshisiweyo lufuna izixhobo ezimbalwa kwaye lufanelekile ukuvavanywa kunye nokusetyenziswa kwiiprojekthi zophando.

Zingaphantsi kweelwimi ezingama-25 zesiNtu ngoku ezixhaswa zizixhobo zokuguqulela ezisetyenziswa ngokubanzi, uninzi lwazo zikumgangatho ophantsi. Ngokwahlukileyo koko, i-NLLB-200 ixhasa iilwimi ezingama-55 zesiNtu ezineziphumo ezikumgangatho ophezulu. Lilonke, le modeli yahlukileyo inokubonelela ngeenguqulelo ezikumgangatho ophezulu kwiilwimi ezithethwa zizigidi ngezigidi zabantu kwihlabathi liphela. Iyonke, amanqaku e-NLLB-200 BLEU aphucula imeko yangaphambili yobugcisa ngomyinge we-44 ekhulwini kuzo zonke izikhokelo ze-10k ze-FLORES-101 benchmark. Kwezinye iilwimi zaseAfrika nezaseIndiya, ulwando lungaphezu kwama-70 ekhulwini kunenkqubo yakutshanje yokuguqulela.

Ngokungafaniyo nezinye iinkqubo zokufunda ngoomatshini bokuguqulela, Isisombululo sikaFacebook sigqame ngokunikezela ngemodeli eqhelekileyo kuzo zonke iilwimi ezingama-200, equka zonke iilwimi kwaye ayifuni iimodeli ezahlukeneyo kulwimi ngalunye.

Uguqulelo lwenziwa ngokuthe ngqo ukusuka kulwimi lomthombo ukuya kulwimi ekujoliswe kulo, ngaphandle koguqulelo oluphakathi kwisiNgesi. Ukudala iinkqubo zokuguqulela jikelele, imodeli eyongezelelweyo ye-LID (Isazisi soLwimi) iyacetywa, evumela ukumisela ulwimi olusetyenziswayo. Ezo. inkqubo inokuqonda ngokuzenzekelayo ulwimi apho ulwazi lunikezelwe kwaye luguqulelwe kulwimi lomsebenzisi.

Uguqulelo luxhaswa nakweliphi na icala, phakathi kwazo naziphi na iilwimi ezingama-200 ezixhaswayo. Ukuqinisekisa umgangatho woguqulelo phakathi kwalo naluphi na ulwimi, isethi yovavanyo lwe-FLORES-200 yebenchmark yalungiswa, ebonisa ukuba imodeli ye-NLLB-200, ngokomgangatho wokuguqulela, ikumndilili wama-44% ngaphezulu kuneenkqubo ze-FLORES-70. Ngaphambili uphando olucetyiweyo olusekwe kukufunda koomatshini xa kusetyenziswa iimetriki ze-BLEU kuthelekiswa uguqulelo lomatshini noguqulo oluqhelekileyo lomntu. Kwiilwimi ezinqabileyo zaseAfrika kunye neelwimi zaseIndiya, ukongama komgangatho kufikelela kuma-XNUMX%. Unokujonga umgangatho woguqulo kwisayithi yedemo elungiselelwe ngokukodwa.

Kwabo banomdla kwiprojekthi, kufuneka bazi ukuba imodeli ifumaneka phantsi kwelayisensi Creative Commons BY-NC 4.0, evumela ukukopishwa, ukuhanjiswa, ukufakwa kwiiprojekthi zakho, kunye nokudalwa kwemisebenzi ephuma kuyo, kodwa ngokuxhomekeke kwi-attribution, ukugcinwa kwelayisensi, kunye nokusetyenziswa kweenjongo ezingezona zorhwebo kuphela. Isixhobo sokubunjwa sinikwe ilayisenisi phantsi kwelayisenisi yeMIT. Ukuvuselela uphuhliso kusetyenziswa imodeli ye-NLLB, kwagqitywa ekubeni kwabiwe i-200 yeedola ukuze kubonelelwe ngemali yokufunda kubaphandi.

Gqibela ukuba unomdla wokwazi okungakumbi ngayo malunga nenqaku, ungabhekisa kwiposti yoqobo Kule khonkco ilandelayo.


Yiba ngowokuqala ukuphawula

Shiya uluvo lwakho

Idilesi yakho ye email aziyi kupapashwa. ezidingekayo ziphawulwe *

*

*

  1. Uxanduva lwedatha: UMiguel Ángel Gatón
  2. Injongo yedatha: Ulawulo lwe-SPAM, ulawulo lwezimvo.
  3. Umthetho: Imvume yakho
  4. Unxibelelwano lwedatha: Idatha ayizukuhanjiswa kubantu besithathu ngaphandle koxanduva lomthetho.
  5. Ukugcinwa kweenkcukacha
  6. Amalungelo: Ngalo naliphi na ixesha unganciphisa, uphinde uphinde ucime ulwazi lwakho.