Course description:
The
course will enable the student to understand and design advanced multi modal
user interfaces - including speech based interaction - which is one of the
primary goal of the VGIS programme.
Speech
is the most natural means for human-human communication. As computing
machines become more and more capable and widespread, there is an increasing
demand to include speech as a key component in human-machine interface. This
course attempts to provide the students with a basic comprehension of the
methods and models applied in speech and multi-modal systems.
Course outline:
-
Automatic
speech recognition and -synthesis
-
Integration
of information from e.g. speech and visual modalities into advanced
multimodal interfaces
-
Multi
modal interface design and evaluation methods
-
Architectures
and platforms of MM systems
Literature:
Lecture notes:
-
Lecture
1 Slides (Introduction and speech
synthesis)
-
Lecture
2 Slides (Speech recognition I)
-
Install
and try out Sphinx-4, a speech recognizer
written entirely in the JavaTM programming language. Some key
steps.
-
Willie
Walker, Paul Lamere, Philip Kwok, et al., "Sphinx-4:
A Flexible Open Source Framework for Speech Recognition,"
Technical Report, Sun Microsystems, Inc.
-
Ro
Gouva, Paul Lamere, Paul Lamere, Philip Kwok, Philip Kwok, William Walker, William Walker, Ro Gouvêa, Rita Singh, Rita Singh, Bhiksha Raj, Bhiksha Raj, Peter Wolf, Peter Wolf,
"Design of the cmu sphinx-4
decoder," EUROSPEECH 2003.
-
Lecture
3 Slides (Speech recognition II)
-
Lecture
4 Slides (Language modeling)
-
Lecture
5 Slides Part 1 & Part
2 by Rita Singh (System design and applications)
Link to Lecture 6-10 by Lars Bo Larsen.
|