LDM3D, the model for 3D image synthesis from Intel and Blockade

LDM3D, the industry's first diffusion model to offer depth mapping to create 3D images with 360-degree views that are vivid and immersive.

Intel and Blockade Labs have released via a blog post information about their joint development of a machine learning model called "LDM3D» (Latent Diffusion Model for 3D) to generate images and depth maps associates based on natural language text descriptions.

The model was trained using the LAION-400M open data set. Prepared by the LAION (Large-Scale Artificial Intelligence Open Network) community, which develops tools, models, and data collections to build free machine learning systems. The LAION-400M collection includes 400 million images with text descriptions.

In addition to the images and their textual descriptions, depth maps are also used when training the LDM3D model, generated for each image using the DPT (Dense Prediction Transformer) machine learning system, which allows you to predict the relative depth of each pixel of a flat image.

Intel Labs, in collaboration with Blockade Labs, introduced the Latent Diffusion Model for 3D (LDM3D), the industry's first diffusion model that offers depth mapping to create 3D images with 360-degree views that are vivid and immersive. .

LDM3D has the potential to revolutionize content creation, metaverse applications, and digital experiences, transforming a wide range of industries, from entertainment and gaming to architecture and design.

Compared to depth prediction technologies in post-processing, the model LDM3D, initially trained deeply, provides more accurate depth information in the generation stage. Another advantage of the model is the ability to generate depth data without increasing the number of parameters: the number of parameters in the LDM3D model is approximately the same as in the latest stable diffusion model.

To demonstrate capabilities of the model DepthFusion application has been prepared, who allows you to create interactive environments for viewing in 360 degree mode from two-dimensional RGB images and depth maps.

LDM3D allows users to generate an image and a depth map from a given text message using almost the same number of parameters.

LDM3D is written in TouchDesigner, a visual programming language suitable for creating interactive multimedia content in real time. The LDM3D model can also be used to generate and modify images based on a proposed template, project the result onto a sphere to create an environment, generate images based on different observer positions, and generate video based on virtual camera movement.

The proposed technology is supposed to have great potential to create new methods of user interaction, which can be in demand in various industries, from entertainment and gaming to architecture and design. For example, LDM3D can be used to create interactive museums and virtual reality environments that generate detailed environments based on natural language desires.

The development resembles the Stable Diffusion image synthesis system, but allows for the formation of three-dimensional visual content, such as spherical panoramic images that can be viewed in 360-degree mode. On the practical side, the model can be used in games and virtual reality systems for the interactive formation of three-dimensional environments.

The LDM3D model is trained on an Intel AI supercomputer with Intel® Xeon® processors and Intel® Habana Gaudi® AI accelerators.

For those interested in the project, they should know that a ready-to-use model is offered for free download for machine learning systems, which can be used with PyTorch and code designed to generate images using models from the Stable Diffusion project.

Worth mentioning than the model is distributed under the permissive license Creative ML OpenRAIL-M, which allows commercial use. Distribution under an open license allows interested researchers and developers to improve the model according to their needs and optimize it for highly specialized applications.

Finally, if you are interested in knowing more about it, you can consult the details In the following link.

DesdeLinux

LDM3D, the model for 3D image synthesis from Intel and Blockade

Leave a Comment Cancel reply