I-Stable Diffusion 2.0, i-AI ekwazi ukuhlanganisa nokulungisa izithombe

I-Stable Diffusion 2.0

Isithombe sikhiqizwe nge-Stable Diffusion 2.0

Muva nje Ukuzinza kwe-AI, kwembuliwe nge-blog post uhlelo lwesibili lwesistimu ukufunda okuzenzakalelayo Ukusabalalisa Okuzinzile, ekwazi ukuhlanganisa nokulungisa izithombe ngokusekelwe kusifanekiso esiphakanyisiwe noma incazelo yombhalo wolimi lwemvelo.

Ukusabalalisa Okuzinzile imodeli yokufunda yomshini ithuthukiswe yi-Stability AI ukukhiqiza izithombe zedijithali ezisezingeni eliphezulu ezincazelweni zolimi lwemvelo. Imodeli ingasetshenziselwa imisebenzi eyahlukene, njengokwenza ukuhumusha okuqondiswa kombhalo kuya kwesithombe nokuthuthukisa isithombe.

Ngokungafani namamodeli aqhudelanayo afana ne-DALL-E, i-Stable Diffusion iwumthombo ovulekile1 futhi ayikhawuli ngokuzenzakalelayo izithombe ezikhiqizayo. Abagxeki baphakamise ukukhathazeka ngokuziphatha kwe-AI, bethi imodeli ingasetshenziselwa ukudala ama-deepfakes.

Ithimba elinamandla lika-Robin Rombach (Ukuzinza kwe-AI) kanye no-Patrick Esser (I-Runway ML) abavela ku-CompVis Group e-LMU Munich eholwa nguProf. UDkt. Björn Ommer, bahole ukukhishwa kwasekuqaleni kwe-Stable Diffusion V1. Bakhe emsebenzini wabo wangaphambili welebhu ngamamodeli okusabalalisa acashile futhi bathola ukwesekwa okubalulekile ku-LAION naku-Eleuther AI. Ungafunda kabanzi mayelana nokukhishwa kwangempela kwe-Stable Diffusion V1 kokuthunyelwe kwethu kwebhulogi kwangaphambilini. U-Robin manje uhola umzamo no-Katherine Crowson ku-Stability AI ukuze bakhe isizukulwane esilandelayo samamodeli wemidiya neqembu lethu elibanzi.

I-Stable Diffusion 2.0 inikezela ngenani lokuthuthuka okuhle nezici uma kuqhathaniswa nenguqulo yokuqala ye-V1.

Izindaba Eziyinhloko ze-Stable Diffusion 2.0

Kule nguqulo entsha enikeziwe imodeli entsha yokuhlanganisa esekelwe encazelweni yombhalo idaliwe "SD2.0-v", esekela ukukhiqiza izithombe ezinesinqumo esingu-768×768. Imodeli entsha yaqeqeshwa kusetshenziswa iqoqo le-LAION-5B lezithombe eziyizigidi eziyizinkulungwane ezingu-5850 ezinezincazelo zombhalo.

Imodeli isebenzisa isethi efanayo yamapharamitha njengemodeli ye-Stable Diffusion 1.5, kodwa ihluke ngokushintshela ekusetshenzisweni kwesishumeki se-OpenCLIP-ViT/H esihluke ngokuyisisekelo, esikwenze kwaba nokwenzeka ukuthuthukisa ngokuphawulekayo ikhwalithi yezithombe eziwumphumela.

U-A ulungisiwe inguqulo eyenziwe lula ye-SD2.0-base, oqeqeshwe ezithombeni ezingu-256×256 kusetshenziswa imodeli yokubikezela umsindo yakudala futhi esekela ukukhiqizwa kwezithombe ngesixazululo esingu-512×512.

Ngaphezu kwalokhu, kubuye kugqanyiswe ukuthi kunikezwa ithuba lokusebenzisa ubuchwepheshe be-supersampling (I-Super Resolution) ukuze kukhuliswe ukulungiswa kwesithombe sangempela ngaphandle kokunciphisa ikhwalithi, kusetshenziswa isikali sendawo nama-algorithms wokwakha kabusha imininingwane.

Kwezinye izinguquko okuvelele kule nguqulo entsha:

  • Imodeli yokucubungula izithombe enikeziwe (i-SD20-upscaler) isekela ukukhuliswa okungu-4x, okuvumela izithombe ezinesinqumo esingu-2048×2048 ukuthi zenziwe.
  • I-Stable Diffusion 2.0 ihlanganisa nemodeli ye-Upscaler Diffusion ethuthukisa ukulungiswa kwesithombe ngesici esingu-4.
  • Kuhlongozwa imodeli ye-SD2.0-depth2img, ecabangela ukujula nokuhlelwa kwendawo kwezinto. Uhlelo lwe-MiDaS lusetshenziselwa ukulinganisa ukujula kwe-monocular.
  • Imodeli yopende yangaphakathi eshayelwa ngombhalo omusha, ilungiswe kahle kusisekelo esisha sombhalo wesithombe se-Stable Diffusion 2.0
  • Imodeli ikuvumela ukuthi uhlanganise izithombe ezintsha usebenzisa esinye isithombe njengesifanekiso, esingase sehluke kakhulu kwesasekuqaleni, kodwa esigcina ukwakheka nokujula sekukonke. Isibonelo, ungasebenzisa ukuma komuntu esithombeni ukuze wakhe omunye umlingisi endaweni efanayo.
  • Imodeli ebuyekeziwe yokulungisa izithombe: I-SD 2.0-inpainting, evumela ukusebenzisa izeluleko zombhalo ukufaka esikhundleni nokushintsha izingxenye zesithombe.
  • Amamodeli alungiselelwe ukusetshenziswa kumasistimu ajwayelekile nge-GPU.

Ekugcineni yebo unentshisekelo yokwazi okwengeziwe ngayo, kufanele wazi ukuthi ikhodi yokuqeqeshwa kwenethiwekhi ye-neural kanye namathuluzi wokucabanga ibhalwe nge-Python kusetshenziswa uhlaka lwe-PyTorch futhi ikhishwe ngaphansi kwelayisensi ye-MIT.

Amamodeli aqeqeshwe kusengaphambili avulwa ngaphansi kwelayisensi yemvume ye-Creative ML OpenRAIL-M, evumela ukusetshenziswa kwezentengiso.

Umthombo: https://stability.ai


Shiya umbono wakho

Ikheli lakho le ngeke ishicilelwe. Ezidingekayo ibhalwe nge *

*

*

  1. Ubhekele imininingwane: Miguel Ángel Gatón
  2. Inhloso yedatha: Lawula Ugaxekile, ukuphathwa kwamazwana.
  3. Ukusemthethweni: Imvume yakho
  4. Ukuxhumana kwemininingwane: Imininingwane ngeke idluliselwe kubantu besithathu ngaphandle kwesibopho esisemthethweni.
  5. Isitoreji sedatha: Idatabase ebanjwe yi-Occentus Networks (EU)
  6. Amalungelo: Nganoma yisiphi isikhathi ungakhawulela, uthole futhi ususe imininingwane yakho.