Workload: 5 ECTS
Prerequisites: Basic knowledge in probability theory, linear
algebra and computer programming.
Description:
Digital processing is playing an increasingly important part in modern multimedia
applications with the development of faster processors and high bandwidth
networks allowing many new applications appearing. Most multimedia systems
require reliable and efficient methods for extracting different
model-parameters, for example for compression, for enhancement or for
classification.
Understanding the different methods and their limits for such a parameter
estimation and classification is therefore crucial both for the design and
evaluation of the entire multimedia system.
The purpose of the theme study is to estimate or extract relevant parameters
or information of a multimedia signal, which can subsequently be used for
automated classification or analysis. Examples of such multimedia signal
include biometrics, images and video, audio and speech signals, and examples
of the classification or analysis process include identity verification,
speech recognition, and music information retrieval.
In the end of the study the students will carry out joint project work with
the support of supervisors. The projects involve different methods for
feature extraction, classification and analysis of multimedia data.
A prototype of systems such as speaker identification, music classification
and visual signature verification will be implemented on PCs or smart phones.
Topics covered include:
· Acquisition and representation of multimedia signals
· Feature extraction from speech, music, images, etc.
· Bayes decision theory: Bayes rule, loss function
· Supervised learning (of classification and regression
functions): K-nearest neighbors, decision trees, linear regression, linear
discriminant analysis
· Unsupervised learning (for clustering, density estimation
and dimensionality reduction): K-means, Gaussian mixture model, principal
component analysis
· Model selection: bias and variance, boosting and
cross-validation
· Applications
Texts
Extensive course slides will be made available prior to the course.
Additional readings:
[1] F. Camastra and A. Vinciarelli, Machine Learning for Audio, Image and Video
Analysis: Theory and Applications. Springer, 2008. Google Books
[2] Richard O. Duda, Peter E.
Hart, David G. Stork, Pattern Classification, Second Edition. Wiley Interscience, 2001.
[3] S.V. Vaseghi, Multimedia
Signal Processing: Theory and Applications in Speech, Music and
Communications. Wiley, 2007. Google
Books
Note: The schedule is
indicative and subject to change, and reading is optional.
Lecture 1-2: Introduction (slides)
Readings: Reference [1] Chapters 1
& 4.
Project presentation (slides).
Lecture 3: Acquisition, representation and feature extraction
of multimedia signals (slides)
Readings: Reference [1] Chapters 2
& 3.
Lecture 4: Decision tree and random forest (slides)
Readings: Reference [2] Chapter 8.
Lecture 5: Clustering and Gaussian mixture model (slides)
Readings: Reference [2] Chapter
10.
Lecture 6: Bayesian decision theory (slides)
Readings: Reference [1] Chapter 5
or Reference [2] Chapter 2.
Lecture 7: Parametric and nonparametric
methods (slides)
Readings: Reference [2] Chapters 3
and 4.
Lecture 8: Supervised learning (slides)
Readings: Reference [2] Chapters 5
and 6.
Lecture 9: Unsupervised learning (slides)
Readings: Reference [2] Chapter
10.
Lecture 10: Model selection and applications
(slides)
Readings: Reference [1] Chapter 7
and Part III.
|