Back to Zheng-Hua's homepage                                     


Online Resources (source code and data sets)


(Click the titles to get access to the resources)

rVAD
Noise-robust voice activity detection (rVAD) - source code, reference VAD for Aurora 2, based on the following paper:
Z.-H. Tan and B. Lindberg, "Low-complexity variable frame rate analysis for speech recognition and voice activity detection." IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 5, pp. 798-807, 2010.

Aurora 2 VAD
The VAD lables for Aurora 2 database generated by forced alignment, as presented in the following paper:
Z.-H. Tan and B. Lindberg, "Low-complexity variable frame rate analysis for speech recognition and voice activity detection." IEEE Journal of Selected Topics in Signal Processing, vol. 4, no. 5, pp. 798-807, 2010.

Audio adversarial examples
Source code and datasets for generation and detection of attacks on deep speech recognition systems.

Keyword spotting robust to external speakers (KWSExternalSpeaker_Code.zip, Python code 500 KB; the single-user hearing aid speech database HADataset.tar.gz 8.8G)
Iván López-Espejo, Zheng-Hua Tan, and Jesper Jensen, "Keyword Spotting for Hearing Assistive Devices Robust to External Speakers," Interspeech 2019, September 15-19, 2019, Graz, Austria. (PDF)

iSocioBot
The source code for iSocioBot, presented in the following paper:
Zheng-Hua Tan, Nicolai Bćk Thomsen, Xiaodong Duan, Evgenios Vlachos, Sven Ewan Shepstone, Morten H. Rasmussen and Jesper Lisby Hřjvang, "iSocioBot - A Multimodal Interactive Social Robot,"ť accepted by International Journal of Social Robotics. (Springer). PDF from Springer Nature Sharing.

Contextual TV Dataset
Miklas S. Kristoffersen, Sven E. Shepstone, and Zheng-Hua Tan. The Importance of Context When Recommending TV Content: Dataset and Algorithms. arXiv:1808.00337 [cs.IR].

Subjective annotations of attention
Andrea Coifman, Péter Rohoska, Miklas S. Kristoffersen, Sven E. Shepstone, and Zheng-Hua Tan. Subjective Annotations for Vision-Based Attention Level Estimation. VISAPP '19: 14th International Conference on Computer Vision Theory and Applications.

Feature learning for face recognition (FeatureLearning_FaceRec_PRL.zip, Python code, 30 KB)
Xiaodong Duan and Zheng-Hua Tan, “A Spatial Self-Similarity Based Feature Learning Method for Face Recognition under Varying Poses,” Pattern Recognition Letters, vol. 111, pp. 109-116, August 2018.

Filter bank neural networks (FBNN.zip, 55 MB)
The source code of filter bank neural network (FBNN), presented in the following paper:
Hong Yu, Zheng-Hua Tan, Yiming Zhang, Zhanyu Ma, and Jun Guo, “DNN Filter Bank Cepstral Coefficients for Spoofing Detection," accepted by IEEE Access. PDF from IEEEXplore.

3D sensing
Three-Dimensional Adaptive Sensing of People - Code and Supplementary Video Examples

Crowd analysis
Crowd Analysis - Supplementary Video Examples of the following paper:
F. Santoro, S. Pedro, Z.-H. Tan and T.B. Moeslund, "Crowd Analysis by Using Optical Flow and Density Based Clustering," EUSIPCO 2010 – the 18th European Signal Processing Conference, Aalborg, Denmark, August 2010.