Python Web Scraping Cookbook
上QQ阅读APP看书,第一时间看更新

Getting ready

First make sure you have access to a PostgreSQL data instance.  Again, you can install one locally, run one in a container, or get an instance in the cloud.

As with MySQL, we need to first create a database. The process is almost identical to that of MySQL but with slightly different commands and parameters.

  1. From the terminal execute the psql command at the terminal.  This takes you into the psql command processor:
# psql -U postgres
psql (9.6.4)
Type "help" for help.
postgres=#
  1. Now create the scraping database:
postgres=# create database scraping;
CREATE DATABASE
postgres=#
  1. Then switch to the new database:
postgres=# \connect scraping
You are now connected to database "scraping" as user "postgres".
scraping=#
  1. Now we can create the Planets table. We first need to create a sequence table:
scraping=# CREATE SEQUENCE public."Planets_id_seq"
scraping-# INCREMENT 1
scraping-# START 1
scraping-# MINVALUE 1
scraping-# MAXVALUE 9223372036854775807
scraping-# CACHE 1;
CREATE SEQUENCE
scraping=# ALTER SEQUENCE public."Planets_id_seq"
scraping-# OWNER TO postgres;
ALTER SEQUENCE
scraping=#
  1. And now we can create the table:
scraping=# CREATE TABLE public."Planets"
scraping-# (
scraping(# id integer NOT NULL DEFAULT nextval('"Planets_id_seq"'::regclass),
scraping(# name text COLLATE pg_catalog."default" NOT NULL,
scraping(# mass double precision NOT NULL,
scraping(# radius double precision NOT NULL,
scraping(# description text COLLATE pg_catalog."default" NOT NULL,
scraping(# moreinfo text COLLATE pg_catalog."default" NOT NULL,
scraping(# CONSTRAINT "Planets_pkey" PRIMARY KEY (name)
scraping(# )
scraping-# WITH (
scraping(# OIDS = FALSE
scraping(# )
</span>scraping-# TABLESPACE pg_default;
CREATE TABLE
scraping=#
scraping=# ALTER TABLE public."Planets"
scraping-# OWNER to postgres;
ALTER TABLE
scraping=# \q

To access PostgreSQL from Python we will use the psycopg2 library, so make sure it is installed in your Python environment using pip install psycopg2.

We are now ready to write Python to store the planets data in PostgreSQL.