While major language models (LLM) are conquering the tech world, artificial intelligence (AI) researchers still don’t know much about their functionality and capabilities under the hood. OpenAI openly admits this, saying in the first sentence of its published article, “Language models have become more capable and are more widely used, but we don’t understand how they work.”
We don’t know why they work
But the state of not knowing exactly how the individual neurons of a neural network work together to produce their output has a well-known name: the black box. So, we currently want something from AI systems and they give us an answer, but what happens in the process (black box) between these two processes is a mystery.
To look inside the black box, researchers at OpenAI used the GPT-4 language model to create and evaluate natural language descriptions for the behavior of neurons in a much less complex language model like GPT-2. In theory, having an interpretable AI model could contribute to these systems working in a desired way.
If their working methods can be understood, their deficiencies can be eliminated.
OpenAI researchers hope that as AI models become more advanced, the quality of the explanations produced will improve and offer better insights into the inner workings of these complex systems. OpenAI published the research paper on an interactive website with sample transcripts of each step, showing highlighted parts of the text and how they correspond to specific neurons. The company states that it will continue its work.
If the desired success in “interpretability” is achieved, then we can understand why ChatGPT and others are making something up and this critical issue can be resolved. Because this is a process just like people are treated when they get sick. If we have a problem, we go to the hospital, they are examined and diagnosed. Once the diagnosis is made, medication is given accordingly. For now, we cannot “examine” AI models with absolute accuracy.