How would you measure the data efficiency --- performance as a function \\ of training set size --- of a learning algorithm? It seems natural to:
- Vary the size of homogenous data and measure performance.
- Next, ramp up the variability of the training data.
This is exactly what we do, with a simple set of challenges.
- The performance of different hypotheses is
compared on a classification task. The learning curves are plotted as a function of training set size.
- Alternatively, alter the relationship between training and test set distributions; the task ranges from classification to transfer learning.
Different challenges based on how the samples are placed in probe set P and target set S during testing.
The algorithm sees a symbol on the left (probe set), and find the same character from the right (target set). Extract features from each image and do nearest-neighbor classification.
- Challenge 0: P and S samples are from the training set.
Challenge 1: P and S samples are taken from new samples of characters that were trained on.
Challenge 2: P and S samples belong to completely unseen characters.
Classification: MNIST, with a varying number of samples per digit.
MNIST. Average percentage of correctly classified samples on the test set from 100 runs.
Transfer learning: We fix either the number of alphabets, or characters-per-alphabet, to be 8 and vary the other number from 4 to 12.
The average of all the runs, with 16 training samples per character.
- Invent more benchmarks for sample or data efficiency.
- Compare a wider variety of methods on increasingly heterogeneous data.
- Instead of comparisons: Define absolute measures of data
Measuring the Data Efficiency of Deep Learning Methods