上QQ阅读APP看书,第一时间看更新
Getting ready
We will be using the planets data page and converting that data into CSV and JSON files. Let's start by loading the planets data from the page into a list of python dictionary objects. The following code (found in (03/get_planet_data.py) provides a function that performs this task, which will be reused throughout the chapter:
import requests
from bs4 import BeautifulSoup
def get_planet_data():
html = requests.get("http://localhost:8080/planets.html").text
soup = BeautifulSoup(html, "lxml")
planet_trs = soup.html.body.div.table.findAll("tr", {"class": "planet"})
def to_dict(tr):
tds = tr.findAll("td")
planet_data = dict()
planet_data['Name'] = tds[1].text.strip()
planet_data['Mass'] = tds[2].text.strip()
planet_data['Radius'] = tds[3].text.strip()
planet_data['Description'] = tds[4].text.strip()
planet_data['MoreInfo'] = tds[5].findAll("a")[0]["href"].strip()
return planet_data
planets = [to_dict(tr) for tr in planet_trs]
return planets
if __name__ == "__main__":
print(get_planet_data())
Running the script gives the following output (briefly truncated):
03 $python get_planet_data.py
[{'Name': 'Mercury', 'Mass': '0.330', 'Radius': '4879', 'Description': 'Named Mercurius by the Romans because it appears to move so swiftly.', 'MoreInfo': 'https://en.wikipedia.org/wiki/Mercury_(planet)'}, {'Name': 'Venus', 'Mass': '4.87', 'Radius': '12104', 'Description': 'Roman name for the goddess of love. This planet was considered to be the brightest and most beautiful planet or star in the\r\n heavens. Other civilizations have named it for their god or goddess of love/war.', 'MoreInfo': 'https://en.wikipedia.org/wiki/Venus'}, {'Name': 'Earth', 'Mass': '5.97', 'Radius': '12756', 'Description': "The name Earth comes from the Indo-European base 'er,'which produced the Germanic noun 'ertho,' and ultimately German 'erde,'\r\n Dutch 'aarde,' Scandinavian 'jord,' and English 'earth.' Related forms include Greek 'eraze,' meaning\r\n 'on the ground,' and Welsh 'erw,' meaning 'a piece of land.'", 'MoreInfo': 'https://en.wikipedia.org/wiki/Earth'}, {'Name': 'Mars', 'Mass': '0.642', 'Radius': '6792', 'Description': 'Named by the Romans for their god of war because of its red, bloodlike color. Other civilizations also named this planet\r\n from this attribute; for example, the Egyptians named it "Her Desher," meaning "the red one."', 'MoreInfo':
...
It may be required to install csv, json and pandas. You can do that with the following three commands:
pip install csv
pip install json
pip install pandas