Theoretical Foundations of Large Language Models
From Neural Networks to ChatGPT
Neural Language
Models
2003
Word2Vec
2013
Transformer
Architecture
2017
BERT/GPT-1
Pretraining
2018
GPT-3
Scaling Laws
2020
RLHF +
ChatGPT
Nov 2022
2003
2010
2017
2020
2022
20 years of cumulative theoretical innovation enabled the ChatGPT breakthrough