Theoretical Foundations of Large Language Models From Neural Networks to ChatGPT Neural Language Models 2003 Word2Vec 2013 Transformer Architecture 2017 BERT/GPT-1 Pretraining 2018 GPT-3 Scaling Laws 2020 RLHF + ChatGPT Nov 2022 2003 2010 2017 2020 2022 20 years of cumulative theoretical innovation enabled the ChatGPT breakthrough