Building Malware Classifier: From Sample Collection to Persistance Model using Python





Malware is a serious threat to all kind of Cyberinfrastructure. Since the first known malware (formerly or generally known as Virus) there have been malware detection techniques. There is the arms race between new incoming of Malware and defense against it. Traditionally, anti-virus software uses signature-based techniques to detect malware and protect the underlying system. Due to some critical limitations of signature-based techniques, anti-virus, and security agency looking for alternative techniques and investing in machine learning based techniques for malware detection. This workshop aimed to train the participants through various steps involved in building malware classifier based on machine learning algorithms. Python is very suitable for the task due to its large number of useful modules suitable for each and every step. During this workshop, following topics will be explained with proper hands-on using Python.

  1. Explanation of the topic and draw out the various required steps.

  2. Data collection: How to collect Malware and Benign samples for the experiment.

  3. Pre-processing: How to carry out various pre-processing tasks (duplicate removal, file type identification etc.) to prepare the suitable dataset for the experiment.

  4. Labeling: How to label the sample i.e. malware v/s benign. (Required for supervised learning.)

  5. Feature extraction: How to extract features from the sample and build the proper representation of features to be used with various Machine learning algorithms. (We will restrict to static features for this workshop).

  6. Model training and Testing: How to train various machine learning algorithms and test their performance to select the best model.

  7. Making model persistence: How to make the selected model persistence to further use.


Basic Knowledge of Python Syntax and Programming in general.

Required module/library: 1. pefile 2. androguard 3. scikit-learn 4. CSV

Content URLs:

All the contents (code, slides and other supporting resources) will available after the workshop but I will keep updating the resources here in due time. Github

Speaker Info:

Dr. Ajit Kumar has completed his Ph.D. from Department of Computer Science, Pondicherry University in 2018. His Ph.D. thesis titled "A Framework for Malware Detection with Static Features using Machine Learning Algorithms" focused on Malware detection using machine learning. He is working with Python since 2012 for his research work and other development work. He is also interested in web development, Information security, and Data science. Python is his language of choice for all the programming related tasks. He has been motivating and training students to adopt Python as his programming language. He loves to write and share the article about Python and its applications.

He has received his Bachelor of Computer Application (BCA) from IGNOU in the year 2009 and Master of Computer Science in the year 2011, from Pondicherry University. With his formal education, he has received Post Graduate Diploma in Statistical and Research Methods from Pondicherry University in 2015 and Post Graduate Diploma in Information Security from IGNOU in 2016.

Id: 718
Section: Networking and Security
Type: Workshops
Target Audience: Intermediate
Last Updated: