Python Web Scraping Cookbook
上QQ阅读APP看书,第一时间看更新

How to parse websites and navigate the DOM using BeautifulSoup

When the browser displays a web page it builds a model of the content of the page in a representation known as the document object model (DOM). The DOM is a hierarchical representation of the page's entire content, as well as structural information, style information, scripts, and links to other content.

It is critical to understand this structure to be able to effectively scrape data from web pages. We will look at an example web page, its DOM, and examine how to navigate the DOM with Beautiful Soup.