Researchers from the North China Electric Power University have recently published a paper titled, ‘A Review on The Use of Deep Learning in Android Malware Detection’. Researchers highlight the fact that Android applications can not only be used by application developers, but also by malware developers with criminal intention to design and spread malicious applications that can affect the normal work of Android phones and tablets, steal personal information and credential data, or even worse lock the phone and ask for ransom.
In this paper, they have explained how deep learning methods can be used as a countermeasure in Android malware detection to fight back malware.
Android Malware Detection Techniques
Researchers have said that one critical point of mobile phones is that they are a sensor-based event system, which permits malware to respond to approaching SMS, position changes and so forth, increasing the sophistication of automated malware-analysis techniques. Moreover, the apps can use services and activities and integrate varied programming languages (e.g. Java and C++) in one application. Each application is analyzed in the following stages:
Static Analysis
The static analysis screens parts of the application without really executing them. This analysis incorporates Signature-based, Permission-based and Component-based analysis. The Signature-based strategy draws features and makes distinctive signs to identify specific malware. Hence, it falls short to recognize the variation or unidentified malware. The Permission-based strategy recognizes permission requests to distinguish malware. The Component-based techniques decompile the APP to draw and inspect the definition and byte code connections of significant components (i.e. activities, services, etc.), to identify the exposures. The principal drawbacks of static analysis are the lack of real execution paths and suitable execution conditions.
Dynamic Analysis
This technique includes the execution of the application on either a virtual machine or a physical device. This analysis results in a less abstract perspective of application than static analysis. The code paths executed during runtime are a subset of every single accessible path. The principal objective of the analysis is to achieve high code inclusion since every feasible event ought to be activated to watch any possible malicious behavior
Hybrid Analysis
The hybrid analysis technique includes consolidating static and dynamic features gathered from examining the application and drawing data while the application is running, separately. Nevertheless, it would boost the accuracy of the identification. The principal drawback of hybrid analysis is that it consumes the Android system resources and takes a long time to perform the analysis.
Use of deep learning in Android malware detection
Currently available machine learning has several weaknesses and some open issues related to the use of DL in Android malware detection include:
- Deep learning lacks transparency to provide an interpretation of the decision created by its methods. Malware analysts need to understand how the decision was made.
- There is no assurance that classification models built based on deep learning will perform in different conditions with new data that would not match previous training data.
- Deep learning studies complex correlations within input and output feature with no innate depiction of causality.
- Deep learning models are not autonomous and need continual retraining and rigorous parameters adjustments.
The DL models in the training phase were subjected to data poisoning attacks, which are merely implemented by manipulating the training and instilling data that make a deep learning model to commit errors. In the testing phase, the models were exposed to several attack types including:
- Adversarial Attacks are where the DL model inputs are the ones that an adversary has invented deliberately to cause the model to make mistakes
- Evasion attack: Here, the intruder exploits malevolent instances at test time to have them incorrectly classified as benign by a trained classifier, without having an impact over the training data. This can breach system integrity, either with a targeted or with an indiscriminate attack.
- Impersonate attack: This attack mimics data instances from targets. The attacker plans to create particular adversarial instances to such an extent that current deep learning-based models mistakenly characterize original instances with different tags from the imitated ones.
- Inversion attack: This attack uses the APIs allowed by machine learning systems to assemble some fundamental data with respect to the target system models. This kind of attack is divided into two types; Whitebox attack and Blackbox attack. The white-box attack implies that an aggressor can loosely get to and download learning models and other supporting data, while the black-box one points to the way that the aggressor just knows the APIs opened by learning models and some observation after providing input.
According to the researchers, hardening deep learning models against different adversarial attacks and detecting, describing and measuring concept drift are vital in future work in Android malware detection. They also mentioned that the limitation of deep learning methods such as lack of transparency and being nonautonomous, is to build more efficient models.
To know more about this research in detail, read the research paper.