Semantic Modelling of Android Malware for Malware Comprehension, Detection, and Classification


Malware has posed a major threat to the Android ecosystem. Existing malware detection tools mainly rely on signature- or feature- based approaches, failing to provide detailed information beyond the mere detection. In this work, we propose a precise semantic model of Android malware based on Deterministic Symbolic Automaton (DSA) for the purpose of malware comprehension, detection and classification. It shows that DSA can capture the common malicious behaviors of a malware family, as well as the malware variants. Based on DSA, we develop an automatic analysis framework, named SMART, which learns DSA by detecting and summarizing semantic clones from malware families, and then extracts semantic features from the learned DSA to classify malware according to the attack patterns. We conduct the experiments in both malware benchmark and 223,170 real-world apps. The results show that SMART builds meaningful semantic models and outperforms both state-of-the-art approaches and anti-virus tools in malware detection. SMART identifies 4583 new malware in real-world apps that are missed by most anti-virus tools. The classification step further identifies new malware variants and unknown families.