In this article by Angel Marquez author of the book PostGIS Essentials see how to insert GIS objects. Now is the time to fill our tables with data. It’s very important to understand some of the theoretical concepts about spatial data before we can properly work with it. We will cover this concept through the real estate company example, used previously.
Basically, we will insert two kinds of data: firstly, all the data that belongs to our own scope of interest. By this, I mean the spatial data that was generated by us (the positions of properties in the case of the example of the real estate company) for our specific problem, so as to save this data in a way that can be easily exploited. Secondly, we will import data of a more general use, which was provided by a third party.
Another important feature that we will cover in this article are the spatial data files that we could use to share, import, and export spatial data within a standardized and popular format called shp or Shape files. In this article, we will cover the following topics:
- Developing insertion queries that include GIS objects
- Obtaining useful spatial data from a public third-party
- Filling our spatial tables with the help of spatial data files using a command line tool
- Filling our spatial tables with the help of spatial data files using a GUI tool provided by PostGIS
(For more resources related to this topic, see here.)
Developing insertion queries with GIS objects
Developing an insertion query is a very common task for someone who works with databases. Basically, we follow the SQL language syntax of the insertion, by first listing all the fields involved and then listing all the data that will be saved in each one:
INSERT INTO tbl_properties( id, town, postal_code, street, "number) VALUES (1, 'London', 'N7 6PA', 'Holloway Road', 32);
If the field is of a numerical value, we simply write the number; if it’s a string-like data type, we have to enclose the text in two single quotes.
Now, if we wish to include a spatial value in the insertion query, we must first find a way to represent this value. This is where the Well-Known Text (WKT) notation enters. WKT is a notation that represents a geometry object that can be easily read by humans; following is an example of this:
Here, we defined a geographic point by using a list of two real values, the latitude (y-axis) and the longitude (x-axis). Additionally, if we need to specify the elevation of some point, we will have to specify a third value for the z-axis; this value will be defined in meters by default, as shown in the following code snippet:
POINT(-0.116190 51.556173 100)
Some of the other basic geometry types defined by the WKT notation are:
- MULTILINESTRING: This is used to define one or more lines
- POLYGON: This is used to define only one polygon
- MULTIPOLYGON: This is used to define several polygons in the same row
So, as an example, an SQL insertion query to add the first row to the table, tbl_properties, of our real estate database using the WKT notation, should be as follows:
INSERT INTO tbl_properties (id, town, postal_code, street, "number", the_geom)
VALUES (1, 'London', 'N7 6PA', 'Holloway Road', 32, ST_
The special function provided by PostGIS, ST_GeomFromText, parses the text given as a parameter and converts it into a GIS object that can be inserted in the_geom field.
Now, we could think this is everything and, therefore, start to develop all the insertion queries that we need. It could be true if we just want to work with the data generated by us and there isn’t a need to share this information with other entities. However, if we want to have a better understanding of GIS (believe me, it could help you a lot and prevent a lot of unnecessary headache when working with data from several sources), it would be better to specify another piece of information as part of our GIS object representation to establish its Spatial Reference System (SRS). In the next section, we will explain this concept.
What is a spatial reference system?
We could think about Earth as a perfect sphere that will float forever in space and never change its shape, but it is not. Earth is alive and in a state of constant change, and it’s certainly not a perfect circle; it is more like an ellipse (though not a perfect ellipse) with a lot of small variations, which have taken place over the years.
If we want to represent a specific position inside this irregular shape called Earth, we must first make some abstractions:
- First we have to choose a method to represent Earth’s surface into a regular form (such as a sphere, ellipsoid, and so on).
- After this, we must take this abstract three-dimensional form and represent it into a two-dimensional plane. This process is commonly called map projection, also known as projection.
There are a lot of ways to make a projection; some of them are more precise than others. This depends on the usefulness that we want to give to the data, and the kind of projection that we choose.
The SRS defines which projection will be used and the transformation that will be used to translate a position from a given projection to another. This leads us to another important point. Maybe it has occurred to you that a geographic position was unique, but it is not. By this, I mean that there could be two different positions with the same latitude and longitude values but be in different physical places on Earth. For a position to be unique, it is necessary to specify the SRS that was used to obtain this position.
To explain this concept, let’s consider Earth as a perfect sphere; how can you represent it as a two-dimensional square? Well, to do this, you will have to make a projection, as shown in the following figure:
A projection implies that you will have to make a spherical 3D image fit into a 2D figure, as shown in the preceding image; there are several ways to achieve this. We applied an azimuthal projection, which is a result of projecting a spherical surface onto a plane. However, as I told you earlier, there are several other ways to do this, as we can see in the following image:
These are examples of cylindrical and conical projections. Each one produces a different kind of 2D image of the terrain. Each has its own peculiarities and is used for several distinct purposes. If we put all the resultant images of these projections one above the other, we must get an image similar to the following figure:
As you can see, the terrain positions, which are not necessary, are the same between two projections, so you must clearly specify which projection you are using in your project in order to avoid possible mistakes and errors when you establish a position.
There are a lot of SRS defined around the world. They could be grouped by their reach, that is, they could be local (state or province), national (an entire country), regional (several countries from the same area), or global (worldwide). The International Association of Oil and Gas Producers has defined a collection of Coordinate Reference System (CRS) known as the European Petroleum Survey Group (EPSG) dataset and has assigned a unique ID to each of these SRSs; this ID is called SRID.
For uniquely defining a position, you must establish the SRS that it belongs to, using its particular ID; this is the SRID. There are literally hundreds of SRSs defined; to avoid any possible error, we must standardize which SRS we will use. A very common SRS, widely used around the globe is the WGS84 SRS with the SRID 4326. It is very important that you store the spatial data on your database, using EPSG: 4326 as much as possible, or almost use one equal projection on your database; this way you will avoid problems when you analyze your data.
The WKT notation doesn’t support the SRID specification as part of the text, since this was developed at the EWKT notation that allows us to include this information as part of our input string, as we will see in the following example:
When you create a spatial field, you must specify the SRID that will be used.
Including SRS information in our spatial tables
The matter that was discussed in the previous section is very important to develop our spatial tables. Taking into account the SRS that they will use from the beginning, we will follow a procedure to recreate our tables by adding this feature. This procedure must be applied to all the tables that we have created on both databases. Perform the following steps:
- Open a command session on pgSQL in your command line tool or by using the graphical GUI, PGAdmin III. We will open the Real_Estate database.
- Drop the spatial fields of your tables using the following instruction:
SELECT DropGeometryColumn('tbl_properties', 'the_geom') Add the spatial field using this command: SELECT AddGeometryColumn('tbl_properties', 'the_geom', 4326, 'POINT', 2);
Repeat these steps for the rest of the spatial tables.
- Now that we can specify the SRS that was used to obtain this position, we will develop an insertion query using the Extended WKT (EWKT) notation:
INSERT INTO tbl_properties ( id, town, postal_code, street, "number", the_geom)
VALUES (1, 'London', 'N7 6PA', 'Holloway Road', 32, ST_
The ST_GeomFromEWKT function works exactly as ST_GeomFromText, but it implements the extended functionality of the WKT notation. Now that you know how to represent a GIS object as text, it is up to you to choose the most convenient way to generate a SQL script that inserts existing data into the spatial data tables. As an example, you could develop a macro in Excel, a desktop application in C#, a PHP script on your server, and so on.
Getting data from external sources
In this section, we will learn how to obtain data from third-party sources. Most often, this data interchange is achieved through a spatial data file. There are many data formats for this file (such as KML, geoJSON, and so on). We will choose to work with the *.shp files, because they are widely used and supported in practically all the GIS tools available in the market.
There are dozens of sites where you could get useful spatial data from practically any city, state, or country in the world. Much of this data is public and freely available. In this case, we will use data from a fabulous website called http://www.openstreetmap.org/.
The following is a series of steps that you could follow if you want to obtain spatial data from this particular provider. I’m pretty sure you can easily adapt this procedure to obtain data from another provider on the Internet. Using the example of the real estate company, we will get data from the English county of Buckinghamshire. The idea is that you, as a member of the IT department, import data from the cities where the company has activities:
- Open your favorite Internet browser and go to http://www.openstreetmap.org/, as shown in the following screenshot:
- Click on the Export tab.
- Click on the Geofabrik Downloads link; you will be taken to http://download.geofabrik.de/, as shown in the following screenshot:
- There, you will find a list of sub regions of the world; select Europe:
- Next is a list of all countries in Europe; notice a new column called .shp.zip. This is the file format that we need to download. Select Great Britain:
- In the next list, select England, you can see your selection on the map located at the right-hand side of the web page, as shown in the following screenshot:
- Now, you will see a list of all the counties. Select the .shp.zip column from the county of Buckinghamshire:
- A download will start. When it finishes, you will get a file called buckinghamshire-latest.shp.zip. Unzip it.
At this point, we have just obtained the data (several shp files). The next procedure will show us how to convert this file into SQL insertion scripts.
Extracting spatial data from an shp file
In the unzipped folder are shp files; each of them stores a particular feature of the geography of this county. We will focus on the shp named buildings.shp.
Now, we will extract this data stored in the shp file. We will convert this data to a sql script so that we can insert its data into the tbl_buildings table. For this, we will use a Postgis tool called shp2pgSQL. Perform the following steps for extracting spatial data from an shp file:
- Open a command window with the cmd command.
- Go to the unzipped folder.
- Type the following command:
shp2pgsql -g the_geom buildings.shp tbl_buildings > buildings.sql
- Open the script with Notepad.
- Delete the following lines from the script:
CREATE TABLE "tbl_buildings"(gid serial, "osm_id" varchar(20), "name" varchar(50), "type"
varchar(20), "timestamp" varchar (30) ); ALTER TABLE "tbl_buildings" ADD PRIMARY KEY (gid); SELECT AddGeometryColumn('','tbl_buildings','geom','0','MULTIPOLYGON',2);
- Save the script. Open and run it with the pgAdmin query editor.
- Open the table; you must have at least 13363 new registers. Keep in mind that this number can change when new updates come.
Importing shp files with a graphical tool
There is another way to import an shp file into our table; we could use a graphical tool called postgisgui for this. To use this tool, perform the following steps:
- In the file explorer, open the folder: C:Program FilesPostgreSQL9.3binpostgisgui.
- Execute the shp2pgsql-gui application. Once this is done, we will see the following window:
- Configure the connection with the server. Click on the View Connections Details… button.
- Set the data to connect to the server, as shown in the following screenshot:
- Click the Add File button. Select the points.shp file.
- Once selected, type the following parameters in the Import List section:
- Mode: In this field, type Append
- SRID: In this field, type 4326
- Geo column: In this field, type the_geom
- Table: In this field, type tbl_landmarks
- Click on the Import button. The import process will fail and show you the following message:
This is because the structure is not the same as shown in the shp and in our table. There is no way to indicate to the tool which field we don’t want to import. So, the only way for us to solve this problem is let the tool create a new table and after this, change the structure. This can be done by following these steps:
- Go to pgAdmin and drop the tbl_landmarks table. Change the mode to Create in the Import list. Click on the Import button.
- Now, the import process is successful, but the table structure has changed. Go to the PGAdmin again, refresh the data, and edit the table structure to be the same as it was before:
- Change the name of the geom field to the_geom.
- Change the name of the osm_id field to id.
- Drop the Timestamp field.
- Drop the primary key constraint and add a new one attached to the id field. For that, right-click on Constraints in the left panel.
- Navigate to New Object | New Primary Key and type pk_landmarks_id. In the Columns tab, add the id field.
- Now, we have two spatial tables, one with data that contains positions represented as the PostGIS type, POINT (tbl_landmarks), and the other with polygons, represented by PostGIS with the type, MULTIPOLYGON(tbl_buildings). Now, I would like you to import the data contained in the roads.shp file, using one of the two previously viewed methods.
- The following table has data that represents the path of different highways, streets, roads, and so on, which belong to this area in the form of lines, represented by PostGIS with the MULTILINESTRING type. When it’s imported, change its name to tbl_roads and adjust the columns to the structure used for the other tables in this article. Here’s an example of how the imported data must look like, as you can see the spatial data is show in its binary form in the following table:
In this article, you learned some basic concepts of GIS (such as WKT, EWKT, and SRS), which are fundamental for working with the GIS data. Now, you are able to craft your own spatial insertion queries or import this data into your own data tables.
Resources for Article:
- Improving proximity filtering with KNN [article]
- Installing PostgreSQL [article]
- Securing the WAL Stream [article]