SEED RL, a Google Open Source Framework for Artificial Intelligence Models

The Google researchers released the news about its development of a new framework that extends the training of artificial intelligence models to thousands of machines. The result is called SEED RL (scalable efficient deep reinforcement learning).

This is promising development because I should enable artificial intelligence algorithms to be trained at millions of images per second and reduce the costs of this training by 80%, Google said in a research paper.

This kind of downsizing could help level the playing field for startups. that until now have not been able to compete with the main ones like Google in the field of AI. The cost of training sophisticated machine learning models in the cloud is surprisingly high. Google formalizes the opening of the SEED RL code, a project aimed at optimizing the cost / performance ratio of reinforcement learning.

Reinforcement learning is a very specific use-case approach in which agents learn about their environment through exploration and optimize their actions to obtain the most rewards.

In »SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference", we introduced an RL agent that scales to thousands of machines, enabling training at millions of frames per second and significantly improving computational efficiency. This is achieved with a novel architecture that takes advantage of accelerators (GPU or TPU) at scale by centralizing model inference and introducing a fast communication layer.

We demonstrate SEED RL performance on popular RL benchmarks such as Google Research Football, Arcade Learning Environment, and DeepMind Lab, and show that by using larger models, data efficiency can be increased. The code has been opened on Github along with examples to run on Google Cloud with GPU.

SEED RL is based on the TensorFlow 2.0 framework y works using a combination of graphics processing units and tensor processing units to centralize model inference. Inference is done centrally using a learning component that trains the model.

The variables and state information of the target model are stored locally and observations on them are sent to the student at each stage of the process. SEED RL also uses a network library based on the open source universal RPC framework to minimize latency.

The Google researchers have said that the learning component by SEED RL can be expanded to thousands of cores, while the number of actors to be repeated between taking measurements in the environment and executing an inference on the model to predict the next action, can be scaled up to thousands of machines.

Google evaluated the effectiveness of SEED RL by comparing it to the popular Arcade learning environment, Google Research Football environment, and various DeepMind Lab environments. Results show that they managed to solve a Google Research Football task while training the model at 2,4 million frames per second using 64 chips of the cloud tensor processing unit.

It's about 80 times faster than previous frames, Google said.

"This translates into a significant time acceleration, as accelerators are much cheaper per operation than CPUs, the cost of experiments is drastically reduced." We believe SEED RL and the results presented show that reinforcement learning has once again caught up with the rest of deep learning in terms of accelerator usage, "writes Lasse Espeholt, research engineer at Google Research.

With an architecture optimized for use in modern accelerators, it is natural to increase the size of the model in an attempt to increase data efficiency.

Google said that the SEED RL code was open source and available on Github, as well as examples showing how to get it to work on Google Cloud with graphics processing units.

Finally, for those who are interested in this new framework, they can go to the following link where they can find more information about it. The link is this. 

Source: https://ai.googleblog.com/


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: Miguel Ángel Gatón
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.