论文部分内容阅读
Decompilation aims to analyze and transform low-level program language (PL) codes such as binary code or assembly code to obtain an equivalent high-level PL.Decompilation plays a vital role in the cyberspace security fields such as software vulnerability discovery and analysis,malicious code detection and analysis,and software engineering fields such as source code analysis,optimization,and cross-language cross-operating system migration.Unfortunately,the existing decompilers mainly rely on experts to write rules,which leads to bottlenecks such as low scalability,development difficulties,and long cycles.The generated high-level PL codes often violate the code writing specifications.Further,their readability is still relatively low.The problems mentioned above hinder the efficiency of advanced applications (e.g.,vulnerability discovery) based on decompiled high-level PL codes.In this paper,we propose a decompilation approach based on the attention-based neural machine translation (NMT) mechanism,which converts low-level PL into high-level PL while acquiring legibility and keeping functionally similar.To compensate for the information asymmetry between the low-level and high-level PL,a translation method based on basic operations of low-level PL is designed.This method improves the generalization of the NMT model and captures the translation rules between PLs more accurately and efficiently.Besides,we implement a neural decompilation framework called Neutron.The evaluation of two practical applications shows that Neutron's average program accuracy is 96.96%,which is better than the traditional NMT model.