ChatGPT: a big step towards true AI, or autocomplete on steroids?

It can demonstrate some breathtaking abilities, but the most human-like chatbot yet certainly divides opinion – and is ringing alarm bells in some parts of the AI community.

Chat programs have come a long way since ELIZA, arguably the original chatbot, was created by bored MIT staff on a mid-1960s lunch break. Today’s much-talked-about equivalent novelty is ChatGPT, currently the most powerful AI engine. The internet is filled with examples of its work, from essay assignments to short stories and whimsical song lyrics.

ChatGPT won’t be doing writers, such as this one, out of a job just yet. But although online chatter about the shortcomings of its mathematical, song writing and other skills abound, you can’t help think that the history of artificial intelligence, for better or worse, has reached some sort of turning point.

It’s certainly good at producing work that often sounds like it could have been written by a real person. But while the internet’s humans excitedly test the skills of this shiny new toy, the experts are finding it hard to put their fingers on how ChatGPT actually does this. AI researcher and podcaster Lex Fridman, for example, has admitted in an interview that he can only guess why the AI works so well.

Key ingredients

OpenAI’s Chat GPT is the result of the exponential advancement of large language models, or LLMs. An LLM is a deep-learning algorithm that can synthesise, predict, translate and generate content that leverages massive data sets. The average size of LLMs has increased tenfold annually for the past couple of years.

Chat GPT’s giant brain, which comprises 175 billion “ML parameters” – or connections between the nodes of its neural network – was originally part of the less-impressive GPT-3 model. (To put things into perspective, the largest natural language generation model before GPT-3 had only 10 billion ML parameters.)

GPT stands for Generative Pre-trained Transformer, and the next components of the ChatGPT sauce lie in the second and third letters of that abbreviation.

GPT-3.5 follows a two-stage training procedure. In the first step, the system is trained on vast amounts of unlabelled data, which makes it superior to other models in terms of flexibility when creating texts or other sequences.

This stage is followed by reinforcement learning from human feedback, or RLHF, which many see as the key to ChatGP’s phenomenal success. This is the point where – as Lex Fridman explains in the video – the model is tweaked to sound more “human”.

Using small datasets, human labellers teach the model what ChatGPT’s desired output should be to certain prompts. The algorithm is asked for several outputs, which the labellers, in turn, rank from best to worst. This tells the AI which answers make the most sense, and simulate most authentically how humans communicate.

Finally, the system is given new prompts sampled from new datasets, and the model is given points for the best outputs, which makes the model more likely to use similar outputs in the future.

The T in GPT stands for Transformer, a revolutionary strain of neural network that can outperform previous models. Such networks can learn context – and thus meaning – by tracking relationships between sequential data, such as the words in a sentence, the elements of a code, or the amino acids in proteins.

Having read sequences of amino acids in millions of proteins, Nvidia’s transformer model, for example, can deliver a blueprint for proteins that can address the functions targeted by pharmaceutical researchers.

Even more relevant to ChatGPT’s humanlike performance is its ability to easily handle long-term dependencies – for example, statistical connections between words that are as far as 500 words apart.

Although Sam Altman, the CEO of ChatGPT’s parent company OpenAI, has advised against using ChatGPT for completing critical tasks at the current development stage, it’s generally agreed the model has huge potential. But not everyone agrees on how to define what it actually is.

How do the two schools of AI research relate to ChatGPT?

There have been two camps formed by AI and ML-based researchers. Symbolic AI sees as its mission to train AI to learn the way humans learn. It aims to embed human knowledge and behavioural rules into computer programs.

The deductive method – where from a small number of statements an indefinite number of new statements can be generated by applying general rules – lies at the core of this approach. It relies heavily on logic and, thanks to the symbols it uses (from alphabetical and numerical symbols to road signs and musical notation), it’s readable by humans.

Members of the symbolic AI approach tend to see ChatGPT as a correlation engine, or a next-word predictor of remarkable ability – but a correlation engine or next-word predictor nonetheless. John Gapper in the Financial Times has called ChatGPT “an eerily impressive, urbane, overconfident version of Wikipedia or Google search.”

Other experts, however, have more trust in the new LLM, and claim they can detect signs of actual reasoning in how it works. Fridman explains how CHAT-3.5 has acquired the faculty of reasoning through additional reams of data and training on a neural network which is finetuned for coding.

Paul Pallaghy, Australian physicist and data scientist, meanwhile, talks about LLMs acquiring “genuine understanding without being specifically asked by us to, simply because it helps next-word prediction.” He sees this faculty of ChatGPT as the by-product of giant-scale next-word prediction.

Is a hybrid model the future for the GPT?

Some supporters of symbolic AI, such as Gary Marcus, Professor of Psychology and Neural Science at New York University, tend to see only the dark side of LLMs, flagging up their outstanding capability to fool humans.

To mitigate the technology’s intrinsic dangers, Marcus is advocating for the combination of deep learning – a subset of machine learning – with old-school, pre-programmed rules to make AI more robust and prevent it from becoming socially harmful.

Marcus is not alone in advocating for weaving the two strands of AI research together to hone reliability and performance. According to the Alan Turing Institute, the aim of the integrated neuro-symbolic approach to marrying symbolic AI and machine learning is to “bridge low-level, data intensive perception and high-level, logical reasoning.

The integration of the two approaches promises a future generation of AI tools that are not only efficient but transparent, reliable and trustworthy.”

ChatGPT can pass the Turing test with flying colours in several of its varied functionalities – there are many reports that it can engage in a conversation with a human without being detected as a machine.

Amusingly, it can also bluff its way out of situations when available data is too scarce to give a well-founded answer – just like we humans do sometimes. But unless this is addressed by developers, its tendency to ad-lib fictional or false answers could undermine the original intention of creating dependable, ethical and un-biased AI. Perhaps ChatGPT is too human after all.

Digital Transformation