Multimedia Information and Signal Processing
Workload
5 ECTS
Prerequisites
Basic knowledge in probability theory, linear algebra and computer programming.
Description
Digital processing is playing an increasingly important part in modern multimedia applications with the development of faster processors and high bandwidth networks allowing many new applications appearing. Most multimedia systems require reliable and efficient methods for extracting different model-parameters, for example for compression, for enhancement or for classification.
Understanding the different methods and their limits for such a parameter estimation and classification is therefore crucial both for the design and evaluation of the entire multimedia system.
The purpose of the theme study is to estimate or extract relevant parameters or information of a multimedia signal, which can subsequently be used for automated classification or analysis. Examples of such multimedia signal include biometrics, images and video, audio and speech signals, and examples of the classification or analysis process include identity verification, speech recognition, and music information retrieval.
A prototype of systems such as speaker identification, music classification and visual signature verification will be implemented on PCs or smart phones.
Topics covered include:
- Acquisition and representation of multimedia signals
- Feature extraction from speech, music, images, etc.
- Bayes decision theory: Bayes rule, loss function
- Supervised learning (of classification and regression functions): K-nearest neighbors, decision trees, linear regression, linear discriminant analysis
- Unsupervised learning (for clustering, density estimation and dimensionality reduction):
K-means, Gaussian mixture model, principal component analysis - Model selection: bias and variance, boosting and cross-validation
- Applications
Texts

F. Camastra and A. Vinciarelli, Machine Learning for Audio, Image and Video Analysis: Theory and Applications. Springer, 2008.
Google Books
Richard O. Duda, Peter E. Hart, David G. Stork, Pattern Classification, Second Edition.
Wiley Interscience, 2001.
S.V. Vaseghi, Multimedia Signal Processing: Theory and Applications in Speech, Music and Communications. Wiley, 2007.
Google Books
Lecturers
Zheng-Hua Tan is an Associate Professor in the Department of Electronic Systems at Aalborg University, Denmark.
He received the B.S. and M.S. degrees in electrical engineering from Hunan University, China, in 1990 and 1996, respectively, and the Ph.D. degree in electronic engineering from Shanghai Jiao Tong University, China, in 1999. He was a postdoctoral fellow in the Department of Computer Science at KAIST, Korea, and an Associate Professor in the Department of Electronic Engineering at Shanghai Jiao Tong University, China.
His research interests include speech recognition, noise robust speech processing, multimedia signal and information processing, multimodal human-computer interaction, and machine learning.
He has a long-time experience in teaching with a focus on these areas. He edited the book Automatic Speech Recognition on Mobile Devices and over Communication Networks (Springer, 2008), and is the Lead Guest Editor of the Special Issue on Speech Processing for Natural Interaction with Intelligent Environments for the IEEE Journal of Selected Topics in Signal Processing. He serves as an Editorial Board Member for Elsevier Computer Speech and Language, and the International Journal of Data Mining, Modelling and Management, and as an Associate Editor for the IEEE Journal of Selected Topics in Signal Processing. He is a Senior Member of the IEEE.

Dr. Jensen was an Associate Editor for the IEEE Transactions on Signal Processing and is currently Member of the Editorial Board of Elsevier Signal Processing and the EURASIP Journal on Advances in Signal Processing. He is a recipient of an European Community Marie Curie Fellowship, former Chairman of the IEEE Denmark Section and the IEEE Denmark Section’s Signal Processing Chapter.
