
How it works...
We start by importing the PyGitHub library in Step 1 in order to be able to conveniently call the GitHub APIs. These will allow us to scrape and explore the universe of repositories. We also import the base64 module for decoding the base64 encoded files that we will be downloading from GitHub. Note that there is a rate limit on the number of API calls a generic user can make to GitHub. For this reason, you will find that if you attempt to download too many files in a short duration, your script will not get all of the files. Our next step is to supply our credentials to GitHub (step 2), and specify that we are looking for repositories with JavaScript, using the query='language:javascript' command. We enumerate such repositories matching our criteria of being associated with JavaScript, and if they do, we search through these for files ending with .js and create local copies (steps 3 to 6). Since these files are encoded in base64, we make sure to decode them to plaintext in step 7. Finally, we show you how to adjust the script in order to scrape other file types, such as Python and PowerShell (Step 8).