I’ve spent the last few years studying AI
Like you, I heard of LLMs, GPT, Agents, ML, but wanted a full picture that connected the dots. Before we dive in, we need first to define intelligence
I like Max Tegmark’s definition: “Intelligence is the ability to achieve complex goals”
This definition will be significant because every AI system fundamentally achieves a complex goal. AI has been around for longer than we think
The 1950s were the first big wave, with attempts to create neurons
Alan Turing came up with his test, inspired by AI
Herbert Simon predicted machines would be capable in 20 yrs, of doing man’s work. Not to be, as AI had a winter in 70s
AI pioneers were interested in machine learning. ML involves building a model that creates useful outputs based on inputs, built on “training data”
The goal is to find statistical patterns in a training set that generalize outside. Humans do this very well in 3 ways.
The 3 are: 1) Supervised learning: Inputs/outputs both provided, predict new outputs based on new inputs 2) Unsupervised learning: Only inputs, have not been categorised or labelled, there is no training data 3) Reinforcement learning: No training inputs or outputs, just learning
The 80s saw a new type of model. Geoffrey Hinton promoted neural networks.Hinton, today the father of AI, pioneered a feedforward neural network
Like a human being, this makes the error an input. This new implementation of ML would be called “deep learning”
Deep learning (DL) algorithm adjusts itself for accuracy through the processes of backpropagation or feeding the error back
DL layers algorithms and computing units, or neurons, into what is an artificial neural network. It would inspire an old tech giant to go Deep, literally
The 90s saw the first big win for AI. A computer called Deep Blue beat the best chess player Garry Kasparov in 1997
IBM would become the AI poster child in 2000s, as Google would quietly create an algo called PageRank. AI had started to seep in but people didn’t have the power
2023 isn’t AI’s first big hype cycle. In the 2010s, research went deep, but nobody built a successful consumer-facing app
Deep in Google Brain, a team of 4 was going to invent a new DL model. 2016 paper would be called “Attention is All You Need”
The model?
“Transformer”
The transformer would use “attention”, which mimicked human attention. Using the concept of attention, it would give higher “attention” to certain inputs over others
Attention would simplify the neural network, make them output better while reducing training time
Not too far away, a little non-profit called Open AI would immediately put this into practice. Generative Pre-trained Transformers or GPT would be developed by OpenAI in 2018
GPT1 in 2018 was trained on 117M parameters, with web data. It was a big start but lacked coherence
GPT2 in 2019 would be trained on 1.5Bn parameters, got coherence right, but struggled at reasoning. GPT3 in 2020 would be trained on 175Bn parameters and got reasoning right
GPT was trained on large amounts of language and called Large Language Models
OpenAI was onto something
By 2021, two key trends were converging to make AI mainstream
a) Compute costs falling exponentially
b) model parameters increasing exponentially
The two combined to make large AI models both useful and simultaneously accessible
Open AI thought of plugging GPT-3 into chat
Just before ChatGPT, a startup called Stability would show the impact of neural nets on images. Instead of using transformers, it used a diffusion model for images
Diffusion models worked poorly for text, but exceptionally for images
Like gas diffusion, the output went viral
ChatGPT would drop and explode in Nov 22, reaching 1M users in 5 days. It had no referral loop, no social layer. But its output became viral
26 years after Deep Blue beat Kasparov, the power of AI was now in normal people’s hands
OpenAI put AI on the internet, creating magic
In March 23, OpenAI would release GPT 3.5 as an API. The release resulted in a flood of Twitter threads hawking new tools
Apps would be most useful in professional work involving repetition, code and basic creativity
But most apps were wrappers; the real tech was OpenAI’s APIs
In April 23, an open-source GitHub Repo called “AutoGPT” would go viral
AutoGPT would introduce the concept of “agents”. Agents were LLMs, but with the power to operate autonomously. “Autonomously” may make it look like they were all-knowing robos, but they were just smart LLMs
Unlike a “dumb” LLM, an agent could take any instruction and break it into achievable sub-goals. The task could be as large as “running a company” to as small as “summarise news from Twitter”
These agents were a highly evolved version of IBM’s Deep Blue
AI will dramatically change the way we work, but I think the fears are overblown
I have built tools on GPT that are excellent at synthesis but lack human creativity
I see it as a potent tool, but tools cannot beat humans.
Humans with tools, beat humans without tools