Python Web Scraping Cookbook
上QQ阅读APP看书,第一时间看更新

How to do it

  1. The following code (found in 03/store_in_mysql.py) will read the planets data and write it to MySQL:
import mysql.connector
import get_planet_data
from mysql.connector import errorcode
from get_planet_data import get_planet_data

try:
# open the database connection
cnx = mysql.connector.connect(user='root', password='mypassword',
host="127.0.0.1", database="scraping")

insert_sql = ("INSERT INTO Planets (Name, Mass, Radius, Description) " +
"VALUES (%(Name)s, %(Mass)s, %(Radius)s, %(Description)s)")

# get the planet data
planet_data = get_planet_data()

# loop through all planets executing INSERT for each with the cursor
cursor = cnx.cursor()
for planet in planet_data:
print("Storing data for %s" % (planet["Name"]))
cursor.execute(insert_sql, planet)

# commit the new records
cnx.commit()

# close the cursor and connection
cursor.close()
cnx.close()

except mysql.connector.Error as err:
if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
print("Something is wrong with your user name or password")
elif err.errno == errorcode.ER_BAD_DB_ERROR:
print("Database does not exist")
else:
print(err)
else:
cnx.close()
  1. This results in the following output:
Storing data for Mercury
Storing data for Venus
Storing data for Earth
Storing data for Mars
Storing data for Jupiter
Storing data for Saturn
Storing data for Uranus
Storing data for Neptune
Storing data for Pluto
  1. Using MySQL Workbench we can see the the records were written to the database (you could use the mysql command line also):
Records displayed using MySQL Workbench
  1. The following code can be used to retrieve the data (03/read_from_mysql.py):
import mysql.connector
from mysql.connector import errorcode

try:
cnx = mysql.connector.connect(user='root', password='mypassword',
host="127.0.0.1", database="scraping")
cursor = cnx.cursor(dictionary=False)

cursor.execute("SELECT * FROM scraping.Planets")
for row in cursor:
print(row)

# close the cursor and connection
cursor.close()
cnx.close()

except mysql.connector.Error as err:
if err.errno == errorcode.ER_ACCESS_DENIED_ERROR:
print("Something is wrong with your user name or password")
elif err.errno == errorcode.ER_BAD_DB_ERROR:
print("Database does not exist")
else:
print(err)
finally:
cnx.close()

  1. This results in the following output:
(1, 'Mercury', 0.33, 4879.0, 'Named Mercurius by the Romans because it appears to move so swiftly.', 'https://en.wikipedia.org/wiki/Mercury_(planet)')
(2, 'Venus', 4.87, 12104.0, 'Roman name for the goddess of love. This planet was considered to be the brightest and most beautiful planet or star in the heavens. Other civilizations have named it for their god or goddess of love/war.', 'https://en.wikipedia.org/wiki/Venus')
(3, 'Earth', 5.97, 12756.0, "The name Earth comes from the Indo-European base 'er,'which produced the Germanic noun 'ertho,' and ultimately German 'erde,' Dutch 'aarde,' Scandinavian 'jord,' and English 'earth.' Related forms include Greek 'eraze,' meaning 'on the ground,' and Welsh 'erw,' meaning 'a piece of land.'", 'https://en.wikipedia.org/wiki/Earth')
(4, 'Mars', 0.642, 6792.0, 'Named by the Romans for their god of war because of its red, bloodlike color. Other civilizations also named this planet from this attribute; for example, the Egyptians named it "Her Desher," meaning "the red one."', 'https://en.wikipedia.org/wiki/Mars')
(5, 'Jupiter', 1898.0, 142984.0, 'The largest and most massive of the planets was named Zeus by the Greeks and Jupiter by the Romans; he was the most important deity in both pantheons.', 'https://en.wikipedia.org/wiki/Jupiter')
(6, 'Saturn', 568.0, 120536.0, 'Roman name for the Greek Cronos, father of Zeus/Jupiter. Other civilizations have given different names to Saturn, which is the farthest planet from Earth that can be observed by the naked human eye. Most of its satellites were named for Titans who, according to Greek mythology, were brothers and sisters of Saturn.', 'https://en.wikipedia.org/wiki/Saturn')
(7, 'Uranus', 86.8, 51118.0, 'Several astronomers, including Flamsteed and Le Monnier, had observed Uranus earlier but had recorded it as a fixed star. Herschel tried unsuccessfully to name his discovery "Georgian Sidus" after George III; the planet was named by Johann Bode in 1781 after the ancient Greek deity of the sky Uranus, the father of Kronos (Saturn) and grandfather of Zeus (Jupiter).', 'https://en.wikipedia.org/wiki/Uranus')
(8, 'Neptune', 102.0, 49528.0, 'Neptune was "predicted" by John Couch Adams and Urbain Le Verrier who, independently, were able to account for the irregularities in the motion of Uranus by correctly predicting the orbital elements of a trans- Uranian body. Using the predicted parameters of Le Verrier (Adams never published his predictions), Johann Galle observed the planet in 1846. Galle wanted to name the planet for Le Verrier, but that was not acceptable to the international astronomical community. Instead, this planet is named for the Roman god of the sea.', 'https://en.wikipedia.org/wiki/Neptune')
(9, 'Pluto', 0.0146, 2370.0, 'Pluto was discovered at Lowell Observatory in Flagstaff, AZ during a systematic search for a trans-Neptune planet predicted by Percival Lowell and William H. Pickering. Named after the Roman god of the underworld who was able to render himself invisible.', 'https://en.wikipedia.org/wiki/Pluto')