In May 2020, OpenAI, the AI company co-founded by Elon Musk and Sam Altman, published GPT-3, then presented as the great neural network of the moment. A state-of-the-art language model, GPT-3 includes 175 billion parameters compared to the 1,5 billion parameters of its predecessor GPT-2.
GPT-3 beat the NLG Turing model (Turing Natural Language Generation) from Microsoft with 17 billion parameters that previously held the record for the largest neural network. The language model has been marveled at, criticized and even subjected to scrutiny; it has also found new and interesting applications.
And now rumors have been released that the release of GPT-4, the next version of the OpenAI language model, could be coming soon.
Although no release date has been announced yet, OpenAI has given some indications about the characteristics of the successor of GPT-3, with which many might expect, that GPT-4 should not be larger than GPT-3, but should use more computational resources, which will limit its environmental impact.
During the session, Altman hinted that, contrary to popular belief, GPT-4 will not be the largest language model. The model will undoubtedly be larger than previous generations of neural networks, but size will not be its hallmark.
First, companies have realized that using model size as an indicator to improve performance is not the only or best way to do it. In 2020, Jared Kaplan and colleagues at OpenAI reportedly concluded that performance improves most when increases in compute budget are primarily allocated to increasing the number of parameters, following a power-law relationship. Google, Nvidia, Microsoft, OpenAI, DeepMind, and other companies that develop language models have taken these guidelines at face value.
But MT-NLG (Megatron-Turing NLG, a neural network built by Nvidia and Microsoft last year with 530 billion parameters), great as it is, isn't the best when it comes to performance. In fact, it's not rated the best in any benchmark category. Smaller models like Gopher or Chinchilla (70 billion parameters), just a fraction of their size, would be much better than MT-NLG in all tasks. Thus, it became clear that the size of the model is not the only factor that leads to a better understanding of the language.
According to Altman, language models suffer from a critical limitation. when it comes to optimization. Training would be so expensive that companies would have to compromise between accuracy and cost. This often results in models being poorly optimized.
The CEO reported that GPT-3 was trained only once, despite some errors that in other cases would have led to retraining. Because of this, OpenAI reportedly decided against it due to unaffordable cost, which prevented the researchers from finding the best set of hyperparameters for the model.
Another consequence of high training costs is that analyzes of model behavior would be restricted. According to one report, when AI researchers concluded that model size was the most relevant variable for improving performance, they did not consider the number of training tokens, that is, the amount of data provided to the models. This would have required extraordinary amounts of computing resources. Tech companies reportedly followed the researchers' findings because it was the best they had.
Altman said that GPT-4 will use many more calculations than its predecessor. OpenAI is expected to implement optimization-related ideas in GPT-4, although to what extent cannot be predicted as its budget is unknown.
However, the statements of Altman show that OpenAI should focus on optimizing variables other than model size.. Finding the best set of hyperparameters, optimal model size, and number of parameters could lead to incredible improvements across all benchmarks.
According to analysts, all predictions for language models will collapse if these approaches are combined into a single model. Altman also said that people wouldn't believe how much better models can be without necessarily being bigger. He may be suggesting that scaling efforts are over for now.
OpenAI reportedly put a lot of effort into solving the AI alignment problem: how to make language models follow human intentions and adhere to human values?
Analysts say that this is not only a difficult mathematical problem (how do we make the AI understand exactly what we want?), but also a philosophical one (there is no universal way to align AI with humans, since the variability of the human values from group to group is huge and often conflicting).
Finally if you are interested in knowing more about ityou can refer to the original post In the following link.