Multi-Modal User Interaction, Fall 2010

Zheng-Hua Tan

Tel: +45 9940-8686
Email zt@es.aau.dk

Office: Room A6-319, Niels Jernes Vej 12

 

 

Course purpose:

The course will enable the student to understand the principles for multi-modal user interaction, in particular speech based interfaces, and to extend the methods for HCI GUI design to analyse, design and synthesise multi-modal user interfaces.

 

 Course outline:

Talking to Computers
            Introduction
            Basics about speech – a short introduction
            Template based approach – DTW
            Statistical model based approach – HMM
            Types of speech recognizers
            Applications
Lip-Reading, Pen-Gesture and Speech Input
            Lip-reading
            Pen-gesture
            Perceptual user interface
Eye Tracking and Applications
            Eye tracking
            Tobii eye tracker
            Applications
            Visual focus of attention
Multimodal Fusion and Design
            Multimodal interaction design
            Multimodal fusion
            Decision-level fusion and combining classifiers
            Design guidelines

Literature:

·         Mainly papers.


Calendar

Lecture notes:

·         Lecture 1 Slides (Introduction and Talking to Computers) 

o    Readings

O'Shaughnessy, D., "Interacting with computers by voice: automatic speech recognition and synthesis," Proceedings of the IEEE, 91(9), pp. 1272 - 1305, Sept. 2003.

Sakoe, H. and Chiba, S., "Dynamic programming algorithm optimization for spoken word recognition", IEEE Trans. Acoustics, Speech, and Signal Processing, 26(1), 1978, pp. 43 - 49.

o    Assignment 1, Assignment 2:  use Matlab to implement Dynamic Time Warping to compare speech signals (optional).

·         Lecture 2 Slides  (Talking to Computers, Lip-Reading and Its Combination with Speech)

o    Readings

Willie Walker, Paul Lamere, Philip Kwok, et al., "Sphinx-4: A Flexible Open Source Framework for Speech Recognition," Technical Report, Sun Microsystems, Inc.

http://htk.eng.cam.ac.uk/

Ma,W. J., Zhou, X., Ross, L. A., Foxe, J. J. and Parra, L. C. (2009). Lip-reading aids word recognition most in moderate noise: a Bayesian explanation using high-dimensional feature space, PLoS One 4, e4638.

o    Assignment: 1) Test Dragon Naturally Speaking 10; 2) Install and try out Sphinx-4, a speech recognizer written entirely in the JavaTM programming language. Some key steps.  

·         Lecture 3 Slides (Eye Tracking and Applications)

o    Readings

Special issue on eye detection and tracking, Computer Vision and Image Understanding, Volume 98 Issue 1, April 2005

http://www.tobii.com/

Sileye O. Ba and Jean-Marc Odobez, “Recognizing Visual Focus of Attention From Head Pose in Natural Meetings”, IEEE TRANSACTIONS ON SYSTEMS, MAN, AND CYBERNETICS—PART B: CYBERNETICS, VOL. 39, NO. 1, FEBRUARY 2009.

·         Lecture 4 Slides (Multiple Modalities)

o    Readings

http://www.reactable.com/ & http://www.chrisharrison.net/projects/research.html

Sharon Oviatt, et al., “Designing the user interface for multimodal speech and pen-based gesture applications: state-of-the-art systems and future research directions,” Human-Computer Interaction, Volume 15 Issue 4, December 2000.

Byron Reeves, Clifford Nass, The media equation: how people treat computers, television, and new media like real people and places. Center for the Study of Language and Inf, 2003.

·         Lecture 5 Slides (Multimodal Fusion and Design)

o    Readings

Alejandro Jaimes, Nicu Sebe, “Multimodal Human Computer Interaction: A Survey,” Computer Vision and Image Understanding 108(1-2): 116-134 (2007).

Pantic M, Rothkrantz M. “Toward an affect-sensitive multimodal human-computer interaction,” Proceedings of the IEEE. 2003;91:1371–1390.

Josef Kittler, Mohamad Hatef, Robert P.W. Duin, and Jiri Matas, “On Combining Classifiers,” IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. 20, NO. 3, MARCH 1998.

Sharon Oviatt, Philip Cohen, “Perceptual user interfaces: multimodal interfaces that process what comes naturally,” Communications of the ACM archive. Volume 43 , Issue 3 (March 2000).

Leah M. Reeves et al."Guidelines for multimodal User Interface Design," COMMUNICATIONS OF THE ACM, Jan. 2004, Vol. 47, No. 1.

Link to First part of the course (Multimodal Interaction Design and Perception, MIDP) by Ann Morrison. Registration is required to get access to the MIDP course content.