Measuring the Data Efficiency of Deep Learning Methods

Hlynur Davíð Hlynsson, M.Sc.

Dr.-Ing. Alberto N Escalante B

Prof. Dr. Laurenz Wiskott

Theory of Neural Systems

Main Idea

How would you measure the data efficiency --- performance as a function \\ of training set size --- of a learning algorithm? It seems natural to:

Vary the size of homogenous data and measure performance.
Next, ramp up the variability of the training data.

This is exactly what we do, with a simple set of challenges.

More Specifically

The performance of different hypotheses is
compared on a classification task. The learning curves are plotted as a function of training set size.
Alternatively, alter the relationship between training and test set distributions; the task ranges from classification to transfer learning.

Experimental Protocol

Different challenges based on how the samples are placed in probe set P and target set S during testing.

The algorithm sees a symbol on the left (probe set), and find the same character from the right (target set). Extract features from each image and do nearest-neighbor classification.

Challenge 0: P and S samples are from the training set.

Challenge 1: P and S samples are taken from new samples of characters that were trained on.

Challenge 2: P and S samples belong to completely unseen characters.

Results

Classification: MNIST, with a varying number of samples per digit.

Average percentage of correctly classified samples on the test set from 100 runs.

MNIST. Average percentage of correctly classified samples on the test set from 100 runs.

Transfer learning: We fix either the number of alphabets, or characters-per-alphabet, to be 8 and vary the other number from 4 to 12.

The average of all the runs, with 16 training samples per character.

Future Work

Invent more benchmarks for sample or data efficiency.
Compare a wider variety of methods on increasingly heterogeneous data.
Instead of comparisons: Define absolute measures of data

Publications

2019

Measuring the Data Efficiency of Deep Learning Methods
Hlynsson, H., Escalante-B., A., & Wiskott, L.
In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods SCITEPRESS - Science and Technology Publications

Measuring the Data Efficiency of Deep Learning Methods Conference poster

@inproceedings{HlynssonEscalante-B.Wiskott2019, author = {Hlynsson, Hlynur and Escalante-B., Alberto and Wiskott, Laurenz}, title = {Measuring the Data Efficiency of Deep Learning Methods}, booktitle = {Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods}, publisher = {SCITEPRESS - Science and Technology Publications}, year = {2019}, doi = {10.5220/0007456306910698}, }

Hlynsson, H., Escalante-B., A., & Wiskott, L.. (2019). Measuring the Data Efficiency of Deep Learning Methods. In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods. SCITEPRESS - Science and Technology Publications. http://doi.org/10.5220/0007456306910698