How is ChatGPT so smart? How does it understand our question, generate an answer, and even create a program code? In this article, we will briefly review the technical background of ChatGPT.
Natural Language Processing, or Computational Linguistics, is a branch of computer science that aims to make computers understand language. To achieve this goal, researchers have been studying many core technologies, one of which is a technology called language model. A language model can be thought of as a program that predicts the next text based on a given text. For example, let us say we have the text, “I want to eat Korean.” Considering that ‘eat’ must be followed by an object, one can easily predict that the next possible word is ‘food’.
To create such a program, the researchers paid attention to mimicking language patterns contained in already-written texts such as books. You ask a language model to read a lot of text, and choose the next most likely word in the text read. A method of predicting the next word in this way is called a probabilisticlanguage model. That is, when the previous text (“I want to eat Korean”) appears, the probability of the next word is calculated. The probability of getting ‘food’ as the next word will be much higher than the probability of getting ‘book’. This may be because the word ‘food’ appeared more frequently after “I want to eat Korean” in the text read by the language model program. A limitation of word occurrence-based language models is that the language model cannot calculate probabilities for unread text. If the sentence “I hope to eat Korean food” is not present in the text read by the language model, it cannot predict that the word following “I hope to eat Korean food” is ‘food’. However, since people know that ‘want’ and ‘hope’ have similar meanings, they can choose ‘food’ as the next word without difficulty. Moreover, the length of the previous text, that is, the context, is a major limitation due to the nature of the probabilistic language model.
The above-mentioned problem can be solved by the development of a neural network structure that mimics nerve cells and deep learning technology to learn it. The meaning of each word is expressed in the form of a sequence of numbers, that is, a vector. Words with similar meanings are located near the vector space, and words with different meanings are located far away. Then, our language model program now reads, “I hope to eat Korean food,” and converts it into corresponding vectors, calculates their relationship, and calculates the probability of the next word. These methods are called neural language models. Because vectors are used, the language model can predict ‘food’ by knowing that ‘hope’and ‘want’ are semantically similar, even if it has never read “I hope to eat Korean.” Furthermore, even if the context text is long, it can be processed by using the neural network structure.
In the last 10 years, along with the development of deep learning technology, various neural language models have been proposed. This time, the proposed language model is GPT (Generative Pretrained Transformer), which is the predecessor of ChatGPT. GPT implemented a language model by using a transformerstructure and calculated word probabilities by reading numerous texts on the web through numerous GPUs (Graphics Processing Unit). While reading the text, the meaning of words can be expressed more elaborately by using neural networks. ChatGPT is a conversational tuning of the GPT language model.
As such, language models continue to refine and evolve, but the principle remains the same. A language model learns how to read a text and calculate the probability of the next word. Based on this principle, we can also think about the advantages and limitations of ChatGPT. ChatGPT can well answer questions related to expertise, and generate program code. The reason may be that ChatGPT reads a huge amount of web documents, and contains many specialized information and program code in them. ChatGPT may also generate false information. The reason may be that ChatGPT already contains false information in the text it has read, or it may be that the language model is not good at predicting the probability value because it does not appear frequently in the web text it has read. However, the language model will continue to evolve as it has been, and, in the future, we may see a very near-perfect version of ChatGPT.
Park Kun-woo
[Assistant Professor School of AI Convergence]