Foundational Papers That Made AI What It Is Today

From early neural networks and optimization methods to transformers, diffusion models, and LLMs.

Neural Networks
Transformers
Reinforcement Learning
Computer Vision
Generative Models

1. Foundations of Machine Learning

1958
The Perceptron
Frank Rosenblatt
1986
Learning Representations by Back-Propagating Errors
David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams
1989
Multilayer Feedforward Networks are Universal Approximators
Kurt Hornik, Maxwell Stinchcombe, Halbert White

2. Optimization & Training Deep Networks

1998
Gradient-Based Learning Applied to Document Recognition (LeNet)
Yann LeCun et al.
2014
Adam: A Method for Stochastic Optimization
Diederik P. Kingma, Jimmy Ba

3. Deep Learning for Computer Vision

2012
ImageNet Classification with Deep Convolutional Neural Networks (AlexNet)
Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton
2014
Very Deep Convolutional Networks for Large-Scale Image Recognition (VGG)
Karen Simonyan, Andrew Zisserman
2015
Deep Residual Learning for Image Recognition (ResNet)
Kaiming He et al.
2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)
Alexey Dosovitskiy et al.

4. Representation Learning

2006
Reducing the Dimensionality of Data with Neural Networks
Geoffrey Hinton, Ruslan Salakhutdinov
2013
Distributed Representations of Words and Phrases and their Compositionality (Word2Vec)
Tomas Mikolov et al.
2014
GloVe: Global Vectors for Word Representation
Jeffrey Pennington, Richard Socher, Christopher Manning

5. Transformers & Large Language Models

2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin et al.
2018
Improving Language Understanding by Generative Pre-Training (GPT-1)
Alec Radford et al.
2019
Language Models are Unsupervised Multitask Learners (GPT-2)
Alec Radford et al.
2020
Language Models are Few-Shot Learners (GPT-3)
Tom B. Brown et al.
2021
Learning Transferable Visual Models From Natural Language Supervision (CLIP)
Alec Radford et al.

6. Reinforcement Learning Landmarks

1992
Q-Learning
Christopher J. C. H. Watkins
2013–2015
Playing Atari with Deep Reinforcement Learning (DQN)
Volodymyr Mnih et al.
2016
Asynchronous Methods for Deep Reinforcement Learning (A3C)
Volodymyr Mnih et al.
2017
Proximal Policy Optimization Algorithms (PPO)
John Schulman et al.
2016–2017
Mastering the Game of Go / Chess / Shogi with Deep Neural Networks and Search (AlphaGo / AlphaZero)
David Silver et al.

7. Generative Models

2013–2014
Auto-Encoding Variational Bayes (VAE)
Diederik P. Kingma, Max Welling
2014
Generative Adversarial Networks (GAN)
Ian Goodfellow et al.
2020
Denoising Diffusion Probabilistic Models (DDPM)
Jonathan Ho et al.
2022
High-Resolution Image Synthesis with Latent Diffusion Models (Stable Diffusion)
Robin Rombach et al.

8. Scaling & Foundation Models

2022
Training Compute-Optimal Large Language Models
Jordan Hoffmann et al. (Chinchilla)
2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron et al.
2023
QLoRA: Efficient Finetuning of Quantized LLMs
Tim Dettmers et al.