Google released the source code of its AI "TAPAS"

Google announced the release of the source code of "TAPAS" (TABLE PArSing), a neural network (artificial intelligence) developed internally in order to answer a question in natural language and get the answer from a relational database or spreadsheet.

In order to obtain optimal results in TAPAS, the developers in charge of the project dedicated themselves to training the neural network with 6.2 million pairs table to text taken from Wikipedia. To verify, the neural network had to restore the missing words both in the tables and in the texts in which it had not been trained. The recovery precision was 71,4% as a benchmark test showed that the neural network provides accurate or comparable answers than rival algorithms in all three data sets.

About TAPAS

Basically the focus of this project is to be able to consult, process and display information related to the terms of the query made by the user in natural language, facilitating on a large scale the obtaining of information.

A basic example of the use of TAPAS is if a user wants to evaluate sales data, income, requests, among other things. Besides that you have to take into account that TAPAS is not only limited to obtaining information from a database, but it is also capable of performing calculations, the algorithm looks for the answer in the table cells, both directly and by means of addition, averaging and other operators, in addition to that it can also look for the answer between several tables at the same time.

Google Says Tapas Outperforms or Matches the Top Three Open Source Algorithms to analyze relational data. Tapas' ability to extract specific items from large data repositories could also lend itself to improving response capabilities.

Underhood, Tapas employs a variation of the BERT natural language processing technique used in searches carried out by the Google engine.

BERT provides greater precision than traditional approaches because it allows an AI to evaluate a text sequence not only from left to right or right to left as is the usual practice, but does both at the same time.

The version that Google implemented for TAPAS allows AI to consider not only the question posed by users and the data they want to query, but also the structure of the relational tables in which the data is stored.

How to install TAPAS on Linux?

Given that TAPAS is essentially a BERT model and therefore has the same requirements. This means that a large model can be trained with a sequence length of 512 which will require a TPU.

To be able to install TAPAS on Linux we require the protocol compiler, which can be found in most Linux distributions.

In Debian, Ubuntu and derivatives of these, we can install the compiler with the following command:

sudo apt-get install protobuf-compiler

In the case of Arch Linux, Manjaro, Arco Linux or any other derivative of Arch Linux, we install with:

sudo pacman -S protobuf

Now to be able to install TAPAS, we only have to obtain the source code and compile with the following commands:

git clone https://github.com/google-research/tapas
cd tapas
pip install -e .

And to run the test suite, we use the tox library which can be run by calling:

pip install tox
tox

From here the AI ​​will have to be trained in the area of ​​interest. Though some trained models are offered in the GitHub repository.

In addition, you can use different configuration options, such as the option max_seq_length to create shorter sequences. This will reduce accuracy but will also make the model GPU-trainable. Another option is to reduce the batch size (train_batch_size), but this will likely affect accuracy as well.

Finally if you want to know more about it About this AI, you can check the details of use, execution and other information In the following link.


Leave a Comment

Your email address will not be published. Required fields are marked with *

*

*

  1. Responsible for the data: Miguel Ángel Gatón
  2. Purpose of the data: Control SPAM, comment management.
  3. Legitimation: Your consent
  4. Communication of the data: The data will not be communicated to third parties except by legal obligation.
  5. Data storage: Database hosted by Occentus Networks (EU)
  6. Rights: At any time you can limit, recover and delete your information.