论文部分内容阅读
针对应用层协议报文序列长、结构复杂的特点,提出了一种基于递归聚类的报文结构提取方法。方法首先在基本块级通过渐近多序列比对算法对样本集进行递归聚类,在分离不同格式报文的同时,降低了序列比对规模;在报文对齐的基础上,依据对齐字节的取值变化率识别字段边界;提出递归回溯的协议结构分析策略,通过识别格式标识字段实现字段间层次关系的提取。对多种公开协议的分析测试表明,该方法能够得到BNF形式的报文格式,并在提高字段识别准确度的同时减少了时间开销,具有较高的应用价值。“,”Messages of complex protocols usually have long byte sequences and many structure types, which pose serious challenges to protocol reverse analysis. A recursive clustering based method for message structure extraction was proposed. Firstly, the method recur- sively clustered the messages through progressive multiple sequence alignment in blocks, which separated messages of different struc- tures with smaller scale of sequence alignment. Then, it identified field boundaries according to the rates of change of aligned bytes. Moreover, a new backtracking policy for hierarchical message structure extraction was applied to extract message structures by identif- ying format distinguisher fields. Experiments on several public protocols showed that the proposed method can derive message formats in BNF form and improve the accuracy of field identification with less time overhead.