Title: "Perception in the real world: Bayesian and active approaches to environmental robustness".
Time: Wednesday, October 4th, 12pm
Venue: NB 3/57
Human beings still clearly outperform machines in recognizing speech or environmental sounds in difficult acoustic environments. In this talk, two potential reasons for this remaining skills gap are addressed:
On the one hand, humans are highly effective at integrating multiple sources of uncertain information, and mounting evidence points to this integration being practically optimal in a Bayesian sense. Yet, the two central tasks of signal enhancement and of speech or sound recognition are performed almost in isolation in many systems, with only estimates of mean values being exchanged between them. The first part of this talk describes concepts for enhancing the interface of these two systems, considering a range of appropriate probabilistic representations. Examples will illustrate how such broader, probabilistic interfaces between signal processing and speech or pattern recognition can help to achieve better performance in real-world conditions, to more closely approximate the Bayesian ideal of using all sources of information in accordance with their respective degree of reliability.
On the other hand, in contrast to machine listening, humans usually listen actively, i.e. they move their head or body for best recognition results as necessary: In difficult acoustic conditions, they not only turn their head for better-ear listening, but they also move to the spot that affords best recognition quality, e.g. moving up to a speaker they are interested in attending. Therefore, the second part of the talk will focus on current work in active machine listening, presenting strategies to endow machines with such capabilities and showing performance of passive versus active approaches in binaural machine listening.