Foundational Papers That Made AI What It Is Today

From early neural networks and optimization methods to transformers, diffusion models, and LLMs.

Neural Networks

Transformers

Reinforcement Learning

Computer Vision

Generative Models

1. Foundations of Machine Learning

Early work on perceptrons, backpropagation, and neural network approximation laid the groundwork for deep learning.

1958

The Perceptron

Frank Rosenblatt

DOI link

1986

Learning Representations by Back-Propagating Errors

David E. Rumelhart, Geoffrey E. Hinton, Ronald J. Williams

Nature

1989

Multilayer Feedforward Networks are Universal Approximators

Kurt Hornik, Maxwell Stinchcombe, Halbert White

ScienceDirect

2. Optimization & Training Deep Networks

Techniques that made training deep neural networks stable and efficient at scale.

1998

Gradient-Based Learning Applied to Document Recognition (LeNet)

Yann LeCun et al.

Original paper

2014

Adam: A Method for Stochastic Optimization

Diederik P. Kingma, Jimmy Ba

arXiv:1412.6980

2015

Batch Normalization

Sergey Ioffe, Christian Szegedy

arXiv:1502.03167

3. Deep Learning for Computer Vision

Breakthrough convolutional network architectures that unlocked modern computer vision.

1998

LeNet-5

Yann LeCun et al.

Project page

2012

ImageNet Classification with Deep Convolutional Neural Networks (AlexNet)

Alex Krizhevsky, Ilya Sutskever, Geoffrey Hinton

NeurIPS · arXiv version

2014

Very Deep Convolutional Networks for Large-Scale Image Recognition (VGG)

Karen Simonyan, Andrew Zisserman

arXiv:1409.1556

2015

Deep Residual Learning for Image Recognition (ResNet)

Kaiming He et al.

arXiv:1512.03385

2020

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale (ViT)

Alexey Dosovitskiy et al.

arXiv:2010.11929

4. Representation Learning

Learning compact, useful representations of data—embeddings, language models, and more.

2006

Reducing the Dimensionality of Data with Neural Networks

Geoffrey Hinton, Ruslan Salakhutdinov

Science

2013

Distributed Representations of Words and Phrases and their Compositionality (Word2Vec)

Tomas Mikolov et al.

arXiv:1310.4546

2014

GloVe: Global Vectors for Word Representation

Jeffrey Pennington, Richard Socher, Christopher Manning

PDF

5. Transformers & Large Language Models

The architectures and models that defined the transformer era in NLP and beyond.

2017

Attention Is All You Need

Ashish Vaswani et al.

arXiv:1706.03762

2018

BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Jacob Devlin et al.

arXiv:1810.04805

2018

Improving Language Understanding by Generative Pre-Training (GPT-1)

Alec Radford et al.

PDF

2019

Language Models are Unsupervised Multitask Learners (GPT-2)

Alec Radford et al.

PDF

2020

Language Models are Few-Shot Learners (GPT-3)

Tom B. Brown et al.

arXiv:2005.14165

2021

Learning Transferable Visual Models From Natural Language Supervision (CLIP)

Alec Radford et al.

arXiv:2103.00020

6. Reinforcement Learning Landmarks

From tabular Q-learning to deep RL and game-playing agents like AlphaGo and AlphaZero.

1992

Q-Learning

Christopher J. C. H. Watkins

Springer

2013–2015

Playing Atari with Deep Reinforcement Learning (DQN)

Volodymyr Mnih et al.

arXiv:1312.5602

2016

Asynchronous Methods for Deep Reinforcement Learning (A3C)

Volodymyr Mnih et al.

arXiv:1602.01783

2017

Proximal Policy Optimization Algorithms (PPO)

John Schulman et al.

arXiv:1707.06347

2016–2017

Mastering the Game of Go / Chess / Shogi with Deep Neural Networks and Search (AlphaGo / AlphaZero)

David Silver et al.

Nature

7. Generative Models

GANs, VAEs, and diffusion models that power modern image and media generation.

2013–2014

Auto-Encoding Variational Bayes (VAE)

Diederik P. Kingma, Max Welling

arXiv:1312.6114

2014

Generative Adversarial Networks (GAN)

Ian Goodfellow et al.

arXiv:1406.2661

2020

Denoising Diffusion Probabilistic Models (DDPM)

Jonathan Ho et al.

arXiv:2006.11239

2022

High-Resolution Image Synthesis with Latent Diffusion Models (Stable Diffusion)

Robin Rombach et al.

arXiv:2112.10752

8. Scaling & Foundation Models

Work on scaling laws, open large models, and efficient fine-tuning that drives current foundation model development.

2022

Training Compute-Optimal Large Language Models

Jordan Hoffmann et al. (Chinchilla)

arXiv:2203.15556

2023

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron et al.

arXiv:2302.13971

2023

QLoRA: Efficient Finetuning of Quantized LLMs

Tim Dettmers et al.

arXiv:2305.14314