Speech recognition
J. H. L. Hansen, J. G. Proakis and J. R. Deller, Jr., "Discrete-Time Processing of Speech Signals", Prentice-Hall, 2000.
Rabiner, L.R., "A tutorial on hidden Markov models and selected applications in speech recognition", Proceedings of the IEEE, 77 (2), 1989, pp. 257 - 286.
S. J. Young, et al., “HTK: Hidden Markov Model Toolkit V3.2.1, Reference Manual,” Cambridge Univ. Speech Group, Mar. 2004. Alternative: Steve Young, "A review of large-vocabulary continuous-speech", IEEE Signal Processing Magazine, Sep 1996.
Robust speech recognition
Y Gong, "Speech Recognition in Noisy Environments: A Survey," Speech Communication 16, pp.261-291,1995.
H Xu, Z-H Tan, P Dalsgaard and B Lindberg, “Robust Speech Recognition from Noise-Type Based Feature Compensation and Model Interpolation in a Multiple Model Framework,” ICASSP 2006, Toulouse, France, May 2006.
Speech recognition over communication networks and on mobile devices
Z.-H. Tan, P. Dalsgaard and B. Lindberg, "Exploiting Temporal Correlation of Speech for Error-Robust and Bandwidth-Flexible Distributed Speech Recognition," IEEE Transactions on Audio, Speech and Language Processing, 15(4), pp. 1391-1403, May 2007.
H Xu, Z-H Tan, P Dalsgaard, R Mattethat and B Lindberg, “A Configurable Distributed Speech Recognition System”, H. Abut, J.H.L. Hansen, K. Takeda (Editors), Digital Signal Processing for In-Vehicle and Mobile Systems 2, Springer Science, New York, NY, 2006.
B. Zhou et al., "A hand-held speech-to-speech translation system," IEEE ASRU 2003.
Speaker recognition and segmentation
S.E. Tranter and D.A. Reynolds, "An overview of automatic speaker diarization systems," IEEE Transactions on Audio, Speech and Language Processing, 14(5), September 2006.
Speech data mining and document retrieval
Gilbert, M., Moore, R.K., Zweig, G. (2005). Introduction to the Special Issue on Data Mining of Speech, Audio, and Dialog IEEE Trans. on Speech and Audio Processing, 13(5), 633-634.Hansen, J.H.L., Huang, R., Zhou, B. et al. (2005). SpeechFind: advances in spoken document retrieval for a national gallery of the spoken word. IEEE Trans. on Speech and Audio Processing, 13(5), 712-730.