December 2022 will be remembered as the time when people realized that we humans are no longer the only ones speaking and understanding human language. Enter ChatGPT.
A neural network which is trained to guess the next word to a text and yet by this simple-looking task is able to learn internal representations of texts that allow it to communicate in human language.
Neural networks learn like young kids do, not by rules but by examples.
ChatGPT used as training examples most of the text on the web, be it literature, newspapers, scientific articles, blogs, Wikipedia, and much, much more.
Amazingly, it turns out that all the texts humanity has produced are not just communication tools for us. They can also be used as a mold to actually reconstruct the structure of language itself and for that matter our world and self model as expressed through civilization. And so from this mold, we have constructed a copy of a language speaking and understanding mechanism.
To understand what the opportunities and problems might be, it is useful to know a bit more about what ChatGPT actually is and how it works.
To start with, it is a neural network. A neural net is not like a usual program which executes a series of logical commands written by the programmer in order to complete a task.
Instead, a neural net starts with a lot of undefined parameters and is only given a list of inputs and desired outputs. The learning procedure consists of gradually adjusting these parameters so as to make sure that the desired outputs are produced.
Neural networks are structures inspired by the neurons in our brains.
Brain neurons use electrical signals to interact and artificial neurons use numbers.
The neural net consists of many layers of simple neurons each one of which receives some numbers from the previous layer, makes a simple computation like addition and multiplications and hands the result over to the neurons in the next layer.
ChatGPT has about 100 layers of simple neurons. It has about 200 billion parameters and the learning process consists of gradually modifying these parameters in order to perform the task of guessing the next word to texts. How are words transformed into numbers though?
Well, each word corresponds to a random list of numbers to start with and the lists are updated as we move through the layers of the neural network. These numbers are part of the parameters already mentioned. The remaining parameters are weights that determine how strong or weak the communication between two neurons in consecutive layers, is. In the end, the final list of numbers is used to assign probabilities to all possible next words to the input text.
If the most probable next word according to this procedure turns out to be the same as the word in the training texts then the training is over, otherwise, all 200 billion numbers are slightly adjusted to nudge the final list towards giving the right probabilities.
When this training procedure is over and all the parameters are fixed then, given a prompt text the net produces the next word, just like it was trained. The new word is added to the original text to produce a new text one word longer. Then the next word to that is guessed by the net and so on and this is what we see as its response.
Fluent language emerges when a certain size of the network and training examples is achieved.
It’s a bit like in the movies where when we reach 24 frames per second all of a sudden we see a continuous image as opposed to a sequence of photographs.
It’s important though to understand that words or concepts are not stored in a single place. Memory does not work as in computers where data are stored at a specific location. Instead memory and processor are integrated. Information is stored in the overall interactions of the neurons. Therefore we can’t just go to a specific location and erase something if we want to get rid of it. Knowledge is distributed and interconnected.
This means that if the net has inappropriate behavior it is difficult to understand what goes wrong and how to correct it. This is one of the main areas of research currently.
The machine exhibits imagination and creativity and it also makes logical deductions. It is prone though to make things up (so-called hallucinations). It doesn’t for the moment have a perfect grasp of what is true and what is not and more work is needed to understand how to separate truth from fiction. Therefore if we ask questions to which we don’t know the answer we cannot be absolutely certain of their veracity.
On the other hand for other tasks like for example asking it to re-express information that we hand in in the first place, it generally works well.
Obviously, such technology will be applied practically everywhere. Office work, code writing, legal counsel, and medical diagnoses just to name a few areas.
Guidelines and best practices will have to be determined for what is only the beginning of a new technological revolution.
For the final word though let me hand the mic over to our new friend: