Here are the features we extracted from the APK's and used in our classification experiments and some details about the tools and techniques we used to extract them
Static features, as their name implies, were extracted statically from the APK's in the datasets. In other words, the apps were NOT executed in any virtual environment. The features gather information about the app, its components, permissions, and source code. Those features were extracted using Aion's extractStaticFeatures method under module "data_inference.featureExtraction" module. Lastly, static features were extracted with the help of the static analysis tool androguard. Here's a complete list of the static features extracted from each app:
Unlike their static counterparts, dynamic features are meant to represent the runtime behavior of apps. In order to extract such features, we deployed each app (malicious and benign) to a Genymotion Android Virtual Machine (AVD) and started it. To simulate human interaction with the app, we used a homemade tool we wrote called Droidutan. Our tool is based on AndroidViewClient and is meant to randomly interact with UI elements of the app. For example, if it finds a button, it will tap it.
We define an app's runtime behavior in terms of the API calls it issues during runtime. To capture the API calls made by an app while being tested/executed using Droidutan, we relied on droidmon, which dumps the sensitive API calls made by an app to the system log. After execution, we gather such dumped calls and represent them as a trace (i.e., sequence), of API calls. Dynamic features are, in essence, counts of every category of API call captured by droidmon and listed in its hooks.json file. The total number of dynamic features is, therefore, 37 features.
@inproceedings{jiang2012dissecting,
title={Dissecting android malware: Characterization and evolution},
author={Jiang, Xuxian and Zhou, Yajin},
booktitle={2012 IEEE Symposium on Security and Privacy},
pages={95--109},
year={2012},
organization={IEEE}}
@inproceedings{hurier2017euphony,
title={Euphony: Harmonious unification of cacophonous anti-virus vendor labels for Android malware},
author={Hurier, M{\'e}d{\'e}ric and Suarez-Tangil, Guillermo and Dash, Santanu Kumar and Bissyand{\'e}, Tegawend{\'e} F and Traon, Yves Le and Klein, Jacques and Cavallaro, Lorenzo},
booktitle={Proceedings of the 14th International Conference on Mining Software Repositories},
pages={425--435},
year={2017},
organization={IEEE Press}}
@article{li2017understanding,
title={Understanding android app piggybacking: A systematic study of malicious code grafting},
author={Li, Li and Li, Daoyuan and Bissyand{\'e}, Tegawend{\'e} F and Klein, Jacques and Le Traon, Yves and Lo, David and Cavallaro, Lorenzo},
journal={IEEE Transactions on Information Forensics and Security},
volume={12},
number={6},
pages={1269--1284},
year={2017},
publisher={IEEE}}
@inproceedings{wei2017deep,
title={Deep ground truth analysis of current android malware},
author={Wei, Fengguo and Li, Yuping and Roy, Sankardas and Ou, Xinming and Zhou, Wu},
booktitle={International Conference on Detection of Intrusions and Malware, and Vulnerability Assessment},
pages={252--276},
year={2017},
organization={Springer}}