Python Geospatial Analysis Cookbook
上QQ阅读APP看书,第一时间看更新

Batch setting the projection definition of a folder full of Shapefiles

Working with one Shapefile is fine but working with tens or hundreds of files is something else. In such a scenario, we'll need automation to get a job done fast.

We have a folder that contains several Shapefiles that are all in the same coordinate system but do not have a .prj file. We want to create a .prj file for each Shapefile in the current directory.

This script is a modified version of the previous code example that could write a .prj file for a single Shapefile into a batch process that can run over several Shapefiles.

How to do it...

We have a folder with many Shapefiles and we would like to create a new .prj file for each Shapefile in this folder, so let's get started:

  1. Create a new Python file named ch02_05_batch_shp_prj.py in your /ch02/code/working/ directory and add the following code:
    #!/usr/bin/env python
    # -*- coding: utf-8 -*-
    
    import urllib
    import os
    from osgeo import osr
    
    
    def create_epsg_wkt_esri(epsg):
        """
        Get the ESRI formatted .prj definition
        usage create_epsg_wkt(4326)
    
        We use the http://spatialreference.org/ref/epsg/4326/esriwkt/
    
        """
        spatial_ref = osr.SpatialReference()
        spatial_ref.ImportFromEPSG(epsg)
    
        # transform projection format to ESRI .prj style
        spatial_ref.MorphToESRI()
    
        # export to WKT
        wkt_epsg = spatial_ref.ExportToWkt()
    
        return wkt_epsg
    
    
    # Optional method to get EPGS as wkt from a web service
    def get_epsg_code(epsg):
        """
        Get the ESRI formatted .prj definition
        usage get_epsg_code(4326)
    
        We use the http://spatialreference.org/ref/epsg/4326/esriwkt/
    
        """
        web_url = "http://spatialreference.org/ref/epsg/{0}/esriwkt/".format(epsg)
        f = urllib.urlopen(web_url)
        return f.read()
    
    
    # Here we write out a new .prj file with the same name
    # as our Shapefile named "schools" in this example
    def write_prj_file(folder_name, shp_filename, epsg):
        """
        input the name of a Shapefile without the .shp
        input the EPSG code number as an integer
    
        usage  write_prj_file(<ShapefileName>,<EPSG CODE>)
    
        """
    
        in_shp_name = "/{0}.prj".format(shp_filename)
        full_path_name = folder_name + in_shp_name
    
        with open(full_path_name, "w") as prj:
            epsg_code = create_epsg_wkt_esri(epsg)
            prj.write(epsg_code)
            print ("done writing projection definition : " + epsg_code)
    
    
    def run_batch_define_prj(folder_location, epsg):
        """
        input path to the folder location containing
        all of your Shapefiles
    
        usage  run_batch_define_prj("../geodata/no_prj")
    
        """
    
        # variable to hold our list of shapefiles
        shapefile_list = []
    
        # loop through the directory and find shapefiles
        # for each found shapefile write it to a list
        # remove the .shp ending so we do not end up with 
        # file names such as .shp.prj
        for shp_file in os.listdir(folder_location):
            if shp_file.endswith('.shp'):
                filename_no_ext = os.path.splitext(shp_file)[0]
                shapefile_list.append(filename_no_ext)
    
        # loop through the list of shapefiles and write
        # the new .prj for each shapefile
        for shp in shapefile_list:
            write_prj_file(folder_location, shp, epsg)
    
    
    # Windows users please use the full path
    # Linux users can also use full path        
    run_batch_define_prj("c:/02_DEV/01_projects/04_packt/ch02/geodata/no_prj/", 4326)

How it works...

Using the standard urllib Python module, we can access an EPSG code via the Web and write this definition to a .prj file. We need to create a list of Shapefiles that we want to define .prj for and then create a .prj file for each Shapefile in this list.

The get_epsg_code(epsg) function returns the ESPG code text definition that we need. The write_prj_file(shp_filename, epsg) function takes two parameters, the Shapefile name and the EPSG code, writing out the .prj file to disk.

Next, we'll create an empty list to store the list of Shapefiles, switch to the directory where the Shapefiles are stored, and then list all the Shapefiles that currently in this directory.

Our for loop then populates the Shapefile list with the filenames without the .shp extension. Finally, the last for loop takes us through each Shapefile and calls our function to write each .prj file for each Shapefile in the list.