I-Stable Diffusion 2.0, i-AI ekwazi ukudibanisa kunye nokuguqula imifanekiso

Ukusasazwa okuZinzileyo 2.0

Umfanekiso owenziwe ngeStable Diffusion 2.0

Mva nje Uzinzo lwe-AI, lutyhiliwe ngeposi blog uhlelo lwesibini lwenkqubo ukufunda ngomatshini Ukusasazwa okuZinzile, ekwaziyo ukudibanisa kunye nokuguqula imifanekiso ngokusekelwe kwithemplate ephakanyisiweyo okanye inkcazo yombhalo wendalo.

Ukwabiwa okuZinzileyo imodeli yokufunda ngomatshini iphuhliswe nguZinzo AI ukwenza imifanekiso yedijithali ekumgangatho ophezulu kwiinkcazo zolwimi lwendalo. Imodeli ingasetyenziselwa imisebenzi eyahlukeneyo, efana nokuvelisa itekisi ekhokelwa ngumfanekiso-kumfanekiso weenguqulelo kunye nokuphucula umfanekiso.

Ngokungafaniyo neemodeli ezikhuphisanayo ezifana ne-DALL-E, iDiffusion eZinzileyo ngumthombo ovulekileyo1 kwaye ayithinteli ngokufanelekileyo imifanekiso eyivelisayo. Abagxeki baphakamise inkxalabo malunga nokuziphatha kwe-AI, bebanga ukuba imodeli ingasetyenziselwa ukwenza izinto ezinzulu.

Iqela eliguqukayo likaRobin Rombach (Ukuzinza kwe-AI) kunye noPatrick Esser (i-Runway ML) evela kwiQela leCompVis e-LMU Munich ekhokelwa nguProf. Bakha kumsebenzi wabo welebhu yangaphambili kunye neemodeli zokusasazwa okufihlakeleyo kwaye bafumana inkxaso ebalulekileyo kwi-LAION kunye ne-Eleuther AI. Unokufunda ngakumbi malunga nokukhutshwa kokuqala kweStable Diffusion V1 kwiposti yethu yangaphambili yebhlog. URobin ngoku ukhokela umzamo kunye noKatherine Crowson kwi-Stability AI ukudala isizukulwana esilandelayo seemodeli zeendaba kunye neqela lethu elibanzi.

I-Stable Diffusion 2.0 inikezela ngenani lophuculo olukhulu kunye neempawu xa kuthelekiswa nenguqulo ye-V1 yasekuqaleni.

Iindaba eziphambili zeDiffusion eZinzileyo 2.0

Kule nguqulo intsha ithiwe thaca imodeli entsha yokwenziwa komfanekiso esekelwe kwinkcazo yokubhaliweyo yenziwe "SD2.0-v", exhasa ukuvelisa imifanekiso kunye nesisombululo se-768 × 768. Imodeli entsha yaqeqeshwa ngokusebenzisa i-LAION-5B iqoqo le-5850 yeebhiliyoni zemifanekiso eneenkcazo zombhalo.

Imodeli isebenzisa isethi efanayo yeeparitha njengemodeli ye-Stable Diffusion 1.5, kodwa ihluke ngokutshintshela ekusebenziseni i-encoder ye-OpenCLIP-ViT / H eyahlukileyo, eyenza ukuba kube lula ukuphucula umgangatho wemifanekiso ebangelwayo.

A sele elungisiwe inguqulelo eyenziwe lula ye-SD2.0-base, oqeqeshwe kwi-256 × 256 imifanekiso usebenzisa imodeli yokubikezela ingxolo yeklasi kunye nokuxhasa isizukulwana semifanekiso ngesisombululo se-512 × 512.

Ukongeza koku, kukwagxininiswa ukuba ithuba lokusebenzisa iteknoloji ye-supersampling ibonelelwe (IsiGqibo esiPhezulu) ukwandisa isisombululo somfanekiso wokuqala ngaphandle kokunciphisa umgangatho, usebenzisa i-spatial scaling kunye ne-algorithms yokwakhiwa kwakhona kweenkcukacha.

Olunye utshintsho evelele kule nguqulo intsha:

  • Imodeli yokulungiswa komfanekiso onikeziweyo (i-SD20-upscaler) isekela i-4x yokukhulisa, ivumela imifanekiso enesisombululo se-2048 × 2048 ukuba iveliswe.
  • I-Stable Diffusion 2.0 ikwabandakanya imodeli ye-Upscaler Diffusion ephucula ukusonjululwa komfanekiso ngenqaku le-4.
  • Kucetywa imodeli ye-SD2.0-depth2img, eqwalasela ubunzulu kunye nokuhlelwa kwendawo yezinto. Inkqubo ye-MiDaS isetyenziselwa ukuqikelela ubunzulu be-monocular.
  • Imodeli yepeyinti yangaphakathi eqhutywa ngesicatshulwa esitsha, ilungiswe kakuhle kwisiseko esitsha seStable Diffusion 2.0 sokubhaliweyo ukuya kumfanekiso
  • Imodeli ikuvumela ukuba udibanise imifanekiso emitsha usebenzisa omnye umfanekiso njenge template, enokuthi yahluke kakhulu kwimvelaphi, kodwa igcina ukubunjwa kunye nobunzulu. Umzekelo, ungasebenzisa ukuma komntu kwifoto ukwenza omnye umlingiswa kwindawo efanayo.
  • Imodeli ehlaziyiweyo yokuguqula imifanekiso: I-SD 2.0-inpainting, evumela ukusebenzisa iingcebiso zombhalo ukubuyisela kunye nokutshintsha iindawo zomfanekiso.
  • Iimodeli zilungiselelwe ukusetyenziswa kwiinkqubo eziqhelekileyo ngeGPU.

Ekugqibeleni ewe unomdla wokwazi ngakumbi ngayo, kufuneka ukwazi ukuba ikhowudi yoqeqesho lwenethiwekhi ye-neural kunye nezixhobo zokucinga zibhaliwe kwiPython usebenzisa isakhelo sePyTorch kwaye ikhutshwe phantsi kwelayisenisi ye-MIT.

Iimodeli eziqeqeshwe kwangaphambili zivulwa phantsi kwelayisensi yokuvuma ye-Creative ML OpenRAIL-M, evumela ukusetyenziswa kwezorhwebo.

Umthombo: https://stability.ai


Shiya uluvo lwakho

Idilesi yakho ye email aziyi kupapashwa. ezidingekayo ziphawulwe *

*

*

  1. Uxanduva lwedatha: UMiguel Ángel Gatón
  2. Injongo yedatha: Ulawulo lwe-SPAM, ulawulo lwezimvo.
  3. Umthetho: Imvume yakho
  4. Unxibelelwano lwedatha: Idatha ayizukuhanjiswa kubantu besithathu ngaphandle koxanduva lomthetho.
  5. Ukugcinwa kweenkcukacha
  6. Amalungelo: Ngalo naliphi na ixesha unganciphisa, uphinde uphinde ucime ulwazi lwakho.