论文部分内容阅读
训练数据和测试数据之间由于信道等差异而引起的不匹配会严重影响语种识别的性能。而在实际应用中,通常只能获得少量的和测试数据匹配的标注数据(目标域数据),以及大量的和测试数据不匹配的标注数据(源域数据)。该文利用迁移学习的方法,通过无监督迁移分量分析(unsupervised transfer component analysis,UTCA),可以合理利用上述两种数据寻找到一个低维子空间,在该空间中,源数据和目标数据之间的分布差异最小,而且数据中有利于分类的属性得以保留,从而提高系统识别性能。实验表明:相对于基线系统,该算法对30s和10s语音的识别性能分别有24.7%和8%的提高。
Mismatches between training data and test data due to channel differences can seriously affect the performance of language recognition. In practical applications, usually only a small amount of label data (target domain data) matching the test data and a large number of label data (source domain data) that do not match the test data are obtained. Using unsupervised transfer component analysis (UTCA), we use the migration learning method to find a low-dimensional subspace using the above two kinds of data reasonably. In this space, between the source data and the target data The distribution of the smallest difference, and the data is conducive to the classification of attributes are retained, thereby enhancing the system identification performance. Experiments show that compared with the baseline system, the proposed algorithm can improve the recognition performance of 30s and 10s speech respectively by 24.7% and 8%.