Large Scale Support Vector Learning

This two-years research project has started in November 2013. It is conducted in cooperation with the chair Computergestützte Statistik at the Technical University of Dortmund. It is funded by the Mercator Research Center Ruhr.

Project Abstract

Machine Learning is concerned with algorithmic and statistical aspects of automated training of adaptive predictive models from data. Learning systems have entered industrial applications as well as the analysis of scientific data on a grand scale.

Support Vector Machines (SVMs) and kernel-based methods in general are well established for their accurate predictions, their statistical properties, and their high flexibility. One of their main disadvantages is high training complexity, which renders exact modeling of large data bases impractical. Resorting to simpler (e.g., linear) models for algorithmic reasons is to be avoided from a statistical point of view. In contrast, large data sets are an ideal prerequisite for training high-quality non-linear models without restrictive assumptions. Numerous approximate training schemes have been proposed for making non-linear SVMs applicable to large amounts of data. These approaches, in contrast to exact SVM models, are neither fully developed nor well understood from a learning and complexity theoretical point of view.

The project aims at closing this gap. The project partners will investigate the relationship of training time and statistical guarantees, relying on empirical as well as formal methods. In addition new Methods will be developed and analyzed. The resulting insights and algorithms will be applied to relevant problems in the areas of astrophysics, medical imaging, driver assistance systems, and sports analysis.