Speech Communication Course, Zheng-Hua Tan

Multi-Modal User Interaction, Fall 2008

Zheng-Hua Tan

Tel: +45 9940-8686
Email zt@es.aau.dk

Office: Room A6-319, Niels Jernes Vej 12

Course description:

The course will enable the student to understand and design advanced multi modal user interfaces - including speech based interaction - which is one of the primary goal of the VGIS programme.

Speech is the most natural means for human-human communication. As computing machines become more and more capable and widespread, there is an increasing demand to include speech as a key component in human-machine interface. This course attempts to provide the students with a basic comprehension of the methods and models applied in speech and multi-modal systems.

Course outline:

Automatic speech recognition and -synthesis
Integration of information from e.g. speech and visual modalities into advanced multimodal interfaces
Multi modal interface design and evaluation methods
Architectures and platforms of MM systems

Literature:

McTear, Spoken Dialogue Technology, Springer, 2004.

Lecture notes:

Lecture 1 Slides (Introduction and speech synthesis)
- Assignment 1
Lecture 2 Slides (Speech recognition I)
- Install and try out Sphinx-4, a speech recognizer written entirely in the Java^TM programming language. Some key steps.
- Willie Walker, Paul Lamere, Philip Kwok, et al., "Sphinx-4: A Flexible Open Source Framework for Speech Recognition," Technical Report, Sun Microsystems, Inc.
- Ro Gouva, Paul Lamere, Paul Lamere, Philip Kwok, Philip Kwok, William Walker, William Walker, Ro Gouvêa, Rita Singh, Rita Singh, Bhiksha Raj, Bhiksha Raj, Peter Wolf, Peter Wolf, "Design of the cmu sphinx-4 decoder," EUROSPEECH 2003.
Lecture 3 Slides (Speech recognition II)
- Sphinx-4 Setup on Eclipse, Sphinx-4 Setup on NetBeans
- Build an application with Sphinx-4.
Lecture 4 Slides (Language modeling)
- Readings: The Java Speech Grammar Format (JSGF)
- Design and implement grammar using JSGF.
Lecture 5 Slides Part 1 & Part 2 by Rita Singh (System design and applications)
- Finish Command and Control Application with Sphinx-4.
- Evaluate Dragon NaturallySpeaking Preferred 10.

Link to Lecture 6-10 by Lars Bo Larsen.