Muva nje Ukuzinza kwe-AI, kwembuliwe nge-blog post uhlelo lwesibili lwesistimu ukufunda okuzenzakalelayo Ukusabalalisa Okuzinzile, ekwazi ukuhlanganisa nokulungisa izithombe ngokusekelwe kusifanekiso esiphakanyisiwe noma incazelo yombhalo wolimi lwemvelo.
Ukusabalalisa Okuzinzile imodeli yokufunda yomshini ithuthukiswe yi-Stability AI ukukhiqiza izithombe zedijithali ezisezingeni eliphezulu ezincazelweni zolimi lwemvelo. Imodeli ingasetshenziselwa imisebenzi eyahlukene, njengokwenza ukuhumusha okuqondiswa kombhalo kuya kwesithombe nokuthuthukisa isithombe.
Ngokungafani namamodeli aqhudelanayo afana ne-DALL-E, i-Stable Diffusion iwumthombo ovulekile1 futhi ayikhawuli ngokuzenzakalelayo izithombe ezikhiqizayo. Abagxeki baphakamise ukukhathazeka ngokuziphatha kwe-AI, bethi imodeli ingasetshenziselwa ukudala ama-deepfakes.
Ithimba elinamandla lika-Robin Rombach (Ukuzinza kwe-AI) kanye no-Patrick Esser (I-Runway ML) abavela ku-CompVis Group e-LMU Munich eholwa nguProf. UDkt. Björn Ommer, bahole ukukhishwa kwasekuqaleni kwe-Stable Diffusion V1. Bakhe emsebenzini wabo wangaphambili welebhu ngamamodeli okusabalalisa acashile futhi bathola ukwesekwa okubalulekile ku-LAION naku-Eleuther AI. Ungafunda kabanzi mayelana nokukhishwa kwangempela kwe-Stable Diffusion V1 kokuthunyelwe kwethu kwebhulogi kwangaphambilini. U-Robin manje uhola umzamo no-Katherine Crowson ku-Stability AI ukuze bakhe isizukulwane esilandelayo samamodeli wemidiya neqembu lethu elibanzi.
I-Stable Diffusion 2.0 inikezela ngenani lokuthuthuka okuhle nezici uma kuqhathaniswa nenguqulo yokuqala ye-V1.
Izindaba Eziyinhloko ze-Stable Diffusion 2.0
Kule nguqulo entsha enikeziwe imodeli entsha yokuhlanganisa esekelwe encazelweni yombhalo idaliwe "SD2.0-v", esekela ukukhiqiza izithombe ezinesinqumo esingu-768×768. Imodeli entsha yaqeqeshwa kusetshenziswa iqoqo le-LAION-5B lezithombe eziyizigidi eziyizinkulungwane ezingu-5850 ezinezincazelo zombhalo.
Imodeli isebenzisa isethi efanayo yamapharamitha njengemodeli ye-Stable Diffusion 1.5, kodwa ihluke ngokushintshela ekusetshenzisweni kwesishumeki se-OpenCLIP-ViT/H esihluke ngokuyisisekelo, esikwenze kwaba nokwenzeka ukuthuthukisa ngokuphawulekayo ikhwalithi yezithombe eziwumphumela.
U-A ulungisiwe inguqulo eyenziwe lula ye-SD2.0-base, oqeqeshwe ezithombeni ezingu-256×256 kusetshenziswa imodeli yokubikezela umsindo yakudala futhi esekela ukukhiqizwa kwezithombe ngesixazululo esingu-512×512.
Ngaphezu kwalokhu, kubuye kugqanyiswe ukuthi kunikezwa ithuba lokusebenzisa ubuchwepheshe be-supersampling (I-Super Resolution) ukuze kukhuliswe ukulungiswa kwesithombe sangempela ngaphandle kokunciphisa ikhwalithi, kusetshenziswa isikali sendawo nama-algorithms wokwakha kabusha imininingwane.
Kwezinye izinguquko okuvelele kule nguqulo entsha:
- Imodeli yokucubungula izithombe enikeziwe (i-SD20-upscaler) isekela ukukhuliswa okungu-4x, okuvumela izithombe ezinesinqumo esingu-2048×2048 ukuthi zenziwe.
- I-Stable Diffusion 2.0 ihlanganisa nemodeli ye-Upscaler Diffusion ethuthukisa ukulungiswa kwesithombe ngesici esingu-4.
- Kuhlongozwa imodeli ye-SD2.0-depth2img, ecabangela ukujula nokuhlelwa kwendawo kwezinto. Uhlelo lwe-MiDaS lusetshenziselwa ukulinganisa ukujula kwe-monocular.
- Imodeli yopende yangaphakathi eshayelwa ngombhalo omusha, ilungiswe kahle kusisekelo esisha sombhalo wesithombe se-Stable Diffusion 2.0
- Imodeli ikuvumela ukuthi uhlanganise izithombe ezintsha usebenzisa esinye isithombe njengesifanekiso, esingase sehluke kakhulu kwesasekuqaleni, kodwa esigcina ukwakheka nokujula sekukonke. Isibonelo, ungasebenzisa ukuma komuntu esithombeni ukuze wakhe omunye umlingisi endaweni efanayo.
- Imodeli ebuyekeziwe yokulungisa izithombe: I-SD 2.0-inpainting, evumela ukusebenzisa izeluleko zombhalo ukufaka esikhundleni nokushintsha izingxenye zesithombe.
- Amamodeli alungiselelwe ukusetshenziswa kumasistimu ajwayelekile nge-GPU.
Ekugcineni yebo unentshisekelo yokwazi okwengeziwe ngayo, kufanele wazi ukuthi ikhodi yokuqeqeshwa kwenethiwekhi ye-neural kanye namathuluzi wokucabanga ibhalwe nge-Python kusetshenziswa uhlaka lwe-PyTorch futhi ikhishwe ngaphansi kwelayisensi ye-MIT.
Amamodeli aqeqeshwe kusengaphambili avulwa ngaphansi kwelayisensi yemvume ye-Creative ML OpenRAIL-M, evumela ukusetshenziswa kwezentengiso.
Umthombo: https://stability.ai