论文部分内容阅读
The limited labeled sample data in the field of advanced security threats de-tection seriously restricts the effective de-velopment of research work. Leing the sample labels from the labeled and unlabeled data has received a lot of research attention and various universal labeling methods have been proposed. However, the labeling task of malicious communication samples targeted at advanced threats has to face the two practical challenges: the difficulty of extracting effec-tive features in advance and the complexity of the actual sample types. To address these problems, we proposed a sample labeling method for malicious communication based on semi-supervised deep neural network. This method supports continuous leing and opti-mization feature representation while labeling sample, and can handle uncertain samples that are outside the conced sample types. Ac-cording to the experimental results, our pro-posed deep neural network can automatically le effective feature representation, and the validity of features is close to or even higher than that of features which extracted based on expert knowledge. Furthermore, our proposed method can achieve the labeling accuracy of 97.64%~98.50%, which is more accurate than the train-then-detect, kNN and LPA methods in any labeled-sample proportion condition. The problem of insufficient labeled samples in many network attack detecting scenarios, and our proposed work can function as a reference for the sample labeling tasks in the similar re-al-world scenarios.