This project determined the efficacy of statistical analysis of program assembly instruction (opcode) frequencies to identify Malware from Goodware.
Malware and Goodware binaries were obtained and a python script was created to extract opcode frequencies from specific parts of these files. Naive Bayes models and Kmeans based models were then trained using these executables.
These models were tested using a different set of programs to determine their efficacy at identifying Malware from Goodware.
The best Naive Bayes model had a recall of 1 for Malware and .8 for Goodware.
Differences in opcode frequencies can differentiate Malware from Goodware. Certain instructions occur much more frequently in one group than in the other; these differences can be used to identify the two types of programs.
TThis project examines models that differentiate Malware from Goodware using the frequencies of program assembly instructions.