更新时间:2021-07-02 13:58:34
coverpage
Title Page
Copyright and Credits
Go Web Scraping Quick Start Guide
About Packt
Why subscribe?
Packt.com
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Reviews
Introducing Web Scraping and Go
What is web scraping?
Why do you need a web scraper?
Search engines
Price comparison
Building datasets
What is Go?
Why is Go a good fit for web scraping?
Go is fast
Go is safe
Go is simple
How to set up a Go development environment
Go language and tools
Git
Editor
Summary
The Request/Response Cycle
What do HTTP requests look like?
HTTP request methods
HTTP headers
Query parameters
Request body
What do HTTP responses look like?
Status line
Response headers
Response body
What are HTTP status codes?
100–199 range
200–299 range
300–399 range
400–499 range
500–599 range
What do HTTP requests/responses look like in Go?
A simple request example
Web Scraping Etiquette
What is a robots.txt file?
What is a User-Agent string?
Example
How to throttle your scraper
How to use caching
Cache-Control
Expires
Etag
Caching content in Go
Parsing HTML
What is the HTML format?
Syntax
Structure
Searching using the strings package
Example – Counting links
Example – Doctype check
Searching using the regexp package
Example – Finding links
Example – Finding prices
Searching using XPath queries
Example – Daily deals
Example – Collecting products
Searching using Cascading Style Sheets selectors
Web Scraping Navigation
Following links
Submitting forms
Example – Submitting searches
Example – POST method
Avoiding loops
Breadth-first versus depth-first crawling
Depth-first
Breadth-first
Navigating with JavaScript
Example – Book reviews
Protecting Your Web Scraper
Virtual private servers
Proxies
Public and shared proxies
Dedicated proxies