Deep Neural Nets: 33 years ago and 33 years from now (Invited Post)

The Yann LeCun et al. (1989) paper Backpropagation Applied to Handwritten Zip Code Recognition is I believe of some historical significance because it is, to my knowledge, the earliest real-world application of a neural net trained end-to-end with backpropagation. Except for the tiny dataset (7291 16x16 grayscale images of digits) and the tiny neural network used (only 1,000 neurons), this paper reads remarkably modern today, 33 years later - it lays out a dataset, describes the neural net architecture, loss function, optimization, and reports the experimental classification error rates over training and test sets. It’s all very recognizable and type checks as a modern deep learning paper, except it is from 33 years ago. So I set out to reproduce the paper 1) for fun, but 2) to use the exercise as a case study on the nature of progress in deep learning.


A Deeper Look at Zero-Cost Proxies for Lightweight NAS

Imagine you have a brand new dataset, and you are trying to find a neural network that achieves high validation accuracy on this dataset. You choose a neural network, but after 3 hours of training, you find that the validation accuracy is only 85%. After more choices of neural networks — and many GPU-hours — you finally find one that has an accuracy of 93%. Is there an even better neural network? And can this whole process become faster?


Normalization is dead, long live normalization!

Since the advent of Batch Normalization (BN), almost every state-of-the-art (SOTA) method uses some form of normalization. After all, normalization generally speeds up learning and leads to models that generalize better than their unnormalized counterparts. This turns out to be especially useful when using some form of skip connections, which are prominent in Residual Networks (ResNets), for example. However, Brock et al. (2021a) suggest that SOTA performance can also be achieved using ResNets without normalization!


Understanding Few-Shot Multi-Task Representation Learning Theory

Multi-Task Representation Learning (MTR) is a popular paradigm to learn shared representations from multiple related tasks. It has demonstrated its efficiency for solving different problems, ranging from machine translation for natural language processing to object detection in computer vision. On the other hand, Few-Shot Learning is a recent problem that seeks to mimic the human capability to quickly learn how to solve a target task with little supervision. For this topic, researchers have turned to meta-learning that learns to learn a new task by training a model on a lot of small tasks. As meta-learning still suffers from a lack of theoretical understanding for its success in few-shot tasks, an intuitively appealing approach would be to bridge the gap between it and multi-task learning to better understand the former using the results established for the latter. In this post, we dive into a recent ICLR 2021 paper by S. Du, W. Hu, S. Kakade, J. Lee and Q. Lei, that demonstrated novel learning bounds for multi-task learning in the few-shot setting and go beyond it by establishing the connections that allow to better understand the inner workings of meta-learning algorithms as well.

"Auction Learning as a Two Player Game": GANs (?) for Mechanism Design

We discuss a new contribution to the nascent area of deep learning for revenue-maximizing auction design, which uses a GAN-style approach in which two neural networks (one which models strategic behavior by bidders, and one which models an auctioneer) compete with each other.


An Understanding of Learning from Demonstrations for Neural Text Generation

In this blog post, we will go over the ICLR 2021 paper titled Text Generation by Learning from Demonstration. This paper introduces a learning method based on offline, off-policy reinforcement learning (RL) which addresses two key limitations of a training objective used in neural text generation models: Maximum Likelihood Estimate (MLE).


Rethinking ValueDice - Does It Really Improve Performance?

This post rethinks the ValueDice algorithm introduced in the following ICLR publication. We promote several new conclusions and perhaps some of them can provide new insights.


Representation Change in Model-Agnostic Meta-Learning