Newer OpenAI models already draw and recognize objects more efficiently

OpenAI researchers have developed two neural networks They can draw objects as directed by the user in natural language and describe images with a high degree of precision.

The projects that they became known a few days ago broaden the range of tasks that artificial intelligence can be applied to, and further advance the AI ​​research community's goal of creating more versatile models that require fewer manual adjustments by engineers to produce accurate results.

DALL E, the first neural network new, is a miniaturized version of the GPT-3 natural language processing model that OpenAI debuted in 2020. GPT-3, one of the most complex neural networks created to date, can generate text and even software code from simple descriptions. DALL E applies the same ability to draw images as directed by the user.

The outstanding capability of the model is that can produce images even in response to descriptions that it encounters for the first time and that are normally difficult for an AI to interpret.

During testing by OpenAI researchers they were able to demonstrate that the model can successfully generate drawings in response to descriptions such as, in addition to that, the model is capable of rendering images in several different styles.

The researchers ddecided to test exactly how versatile AI is by having him tackle several additional tasks of varying difficulty.

In a series of experiments, the model proved to be highly efficient, having the ability to generate the same image from multiple angles and at different levels of resolution.

Another AI test also showed that the model is sophisticated enough to customize individual details of the image it is asked to generate.

"Simultaneous control of multiple objects, their attributes, and their spatial relationships presents a new challenge," the OpenAI researchers wrote in a blog post. "For example, consider the phrase" a hedgehog in a red hat, yellow gloves, a blue shirt, and green pants. " To correctly interpret this sentence, DALL · E must not only correctly compose each garment with the animal, but also form the associations (hat, red), (gloves, yellow), (shirt, blue) and (pants, green) without mixing them «.

The other neural network recently detailed OpenAI, Clip, focuses on recognizing objects in existing images instead of drawing new ones.

And while there are already computer vision models that classify images that way, it is important to note that most of them can only identify a small set of objects for which they are specifically trained.

An AI that classifies animals in wildlife photos, for example, has to be trained on a large number of wildlife photos to produce accurate results. What distinguishes Clip from OpenAI is that it is capable of creating a description of an object that it has not found before.

Clip's versatility is the fruit of a new training approach that the lab has developed to build the model.

For the training process, OpenAI did not use an image data set drawn manually, but pictures obtained from the public web and its attached text captions. Captions allowed Clip to build a broad lexicon of words associated with different types of objects, associations that it could then use to describe objects it hadn't seen before.

"Deep learning requires a large amount of data, and vision models have traditionally been trained on manually labeled data sets that are expensive to build and only provide oversight for a limited number of predetermined visual concepts," detailed the researchers behind Clip. "Rather, CLIP learns from the text and image pairs that are already publicly available on the Internet."

Finally, if you want to know more about it About OpenAI models, you can check the details In the following link.


Be the first to comment

Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: Miguel Ángel Gatón
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.