Machine Learning for Cybersecurity Cookbook
上QQ阅读APP看书,第一时间看更新

How it works...

As you can see, in Step 1, we imported the pefile module to enumerate the samples. Once that is done, we define the convenience function, as you can see in Step 2. The reason being that it often imports using varying cases (upper/lower). This causes the same import to appear as distinct imports.

After preprocessing the imports, we then define another function to collect all the imports of a file into a list. We will also define a function to collect the names of the sections of a file in order to standardize these names such as .text, .rsrc, and .reloc while containing distinct parts of the file (Step 3). The files are then enumerated in our folders and empty lists will be created to hold the features we will be extracting. The predefined functions will then collect the imports (Step 4), section names, and the number of sections of each file (Steps 5 and 6). Lastly, a try-catch clause will be defined in case a file's PE header cannot be parsed (Step 7). This can happen for many reasons. One reason being that the file is not actually a PE file. Another reason is that its PE header is intentionally or unintentionally malformed.