Android Malware Insights 2018

Here you can download all the data we gathered and utilized during our experiments including information about different apps in the aformentioned datasets, the feature vectors we extracted from their APK's, and two of the scripts we used to perform the experiments.

Encyclopedia Malicia

During our experiments, we augmented some of the information provided by Euphony with information we gathered from malicious apps belonging to Malgenome, Piggybacking, and AMD (i.e., a total of more than 36,000 apps), to create an "encyclopedia" of malicious apps. We represent the gathered information as a CSV file, in which each app has the following fields:

sha256	The SHA256 hash of the APK
sha1	The SHA1 hash of the APK
md5	The MD5 hash of the APK
dex_date	The date/time on which the app was allegedly compiled
apk_size	The size (in KB) of the app's APK archive
pkg_name	The app package name (e.g., com.my.app)
vercode	The app's version code
vt_detection	The number of VirusTotal antiviral software that can detect the app
vt_scan_date	The last date/time on which the app was scanned on VirusTotal
dex_size	The size (in KB) of the app's classes.dex file
markets	The marketplace on which the app was found
name	The app's malware family name (e.g., Gingermaster)
types	The app's malware type (e.g., Trojan)
multiple_names	Whether the app is given multiple family names by VirusTotal scanners
multiple_types	Whether the app is given multiple types by VirusTotal scanners

Download

Feature vectors

We extracted two types of features from the APK's in the previously mentioned datasets viz., static and dynamic. The static features, as their name suggests, were extracted from the APK without running them using the help of Androguard. The dynamic features were extracted from API call traces depicting the apps' runtime behavior. Apps were deployed on virtual devices, interacted with using a homemade tool, Droidutan, and monitored using droidmon. You can find a breakdown of the features under our information page.

The downloadable zip archive is divided into five directories, each containing two directories (i.e., static and dynamic). The five directories contain feature vectors (organized as Python lists) of the following datasets:

amd	Features extracted from a sample (1,250 apps) of the AMD dataset
gplay16	Features extracted a total of 1,1882 benign apps we downloaded from Google Play
malgenome	Features extracted from the apps belonging to the Malgenome dataset
original	Features extracted from the benign apps in the Piggybacking datasets
piggybacked	Features extracted from the repackaged versions of the original apps

Download

Scripts

Lastly, you can download a couple of Python scripts we used to run our classification experiments. Those scripts make use of some libraries included in Aion, a framework we are currently developing to analyze and detect Android malware; feel free to clone it and play with it.

compatibility.py	Used to conduct the forward-backward compatibility experiments
experiment.py	Used to conduct K-fold cross-validated classification on apps in a dataset and print statistics about the name/type of misclassified apps

Photos by Fotogrph | Design by TEMPLATED.

Downloads

Encyclopedia Malicia

Feature vectors

Scripts