
Researcher

Supervisor

Supervisor
We empirically decompose the generalization loss of deep neural networks into bias and variance components on an image classification task by constructing ensembles using geometric-mean averaging of the sub-net outputs and we isolate double descent at the variance component of the loss.
Our results show that small models afford ensembles that outperform single large models while requiring considerably fewer parameters and computational steps. We also find that deep double descent that depends on the existence of label noise can be mitigated by using ensembles of models subject to identical label noise almost as thoroughly as by ensembles of networks each trained subject to i.i.d. noise.