Project 4: Address Geocoding
By Luminita Slevoaca
Initial data
Project 4 focuses on geocoding the addresses of customers that performed home radon tests. Radon is a radioactive gas and is responsible for an estimated 20,000 lung cancer deaths each year. (EPA, 2010) The concentration of radon is determined by the geology of and area and the types of soils. The Environmental Protection Agency (EPA) used Home Tested Incorporated’s dataset, which contains the results produced by home testers, to map the areas that have a high radon potential.
In order to perform the address geocoding we have been provided with the following datasets: “geology” and “soils” shapefiles, both containing a radon potential attribute (RP_Soil, RP_Geol), a “Roads” shapefile and a standalone attribute table containing customers’ addresses. The shapefiles used for this project are projected using Albers Conical Equal Area and North American Datum of 1927.
Analysis
In this lesson we have been introduced to address geocoding or address matching. According to King, et al. (2010), “address matching is the process of generating geographic coordinates (geocoding) for a dataset that contains postal addresses”. The attribute table “Addresses” contained the zip code and other associated information but not the geographic coordinates for each address. From the “Tools” toolbar, we used the “Geocode Addresses…” option, selected the “Roads” shapefile as our address locator and matched it with the “Addresses” table. The result was a new shapefile “Geocoded Addresses” containing all the records from the attribute table assigned with geographic coordinates.
We also have been introduced to two new spatial overlays, union and merge. In the first part of the lesson we used Union, an operation that preserves in the output shapefile all the attributes from the input shapefiles. We performed a union overlay for the “geology” and “soils” layers and obtained a shapefile that contains both the radon potential attribute for geology and for soil. Once we had both radon potential attributes in one table, we determined the areas that had the highest radon potential.
The first map illustrates the high potential radon zones and the 118 geocoded addresses resulted from our above analysis. The map also includes one new customer, located at 203 Saw Mill Road, that we geocoded using the “Find” tool. In the “Find” dialog box, under the “Addresses” tab, we selected the “Roads” shapefile as our address locator and introduced the address of our new customer. In the result list, just one location had a score of 100, and we added a graphic point mark its location.
The thematic is created after a 3 classes quantile classification, with the zone divided into low, medium and high potential of radon gas. Looking at the map we noticed that most of the homes are situated in areas with high radon gas potential.
Creating the map from Figure 1 helped learn more about legends. I wanted to add the graphic point representing the new customer to the legend and found two possible methods. The first method is to create a .dbf table with the x and y coordinates of the point and export it as a shapefile and then add it to the legend. The second method is to select the created legend, choose the “Covert to Graphics” and the star point and a label. The first method is the preferred method but I opted for the second one for simplicity reasons.
By Luminita Slevoaca
Initial data
Project 4 focuses on geocoding the addresses of customers that performed home radon tests. Radon is a radioactive gas and is responsible for an estimated 20,000 lung cancer deaths each year. (EPA, 2010) The concentration of radon is determined by the geology of and area and the types of soils. The Environmental Protection Agency (EPA) used Home Tested Incorporated’s dataset, which contains the results produced by home testers, to map the areas that have a high radon potential.
In order to perform the address geocoding we have been provided with the following datasets: “geology” and “soils” shapefiles, both containing a radon potential attribute (RP_Soil, RP_Geol), a “Roads” shapefile and a standalone attribute table containing customers’ addresses. The shapefiles used for this project are projected using Albers Conical Equal Area and North American Datum of 1927.
Analysis
In this lesson we have been introduced to address geocoding or address matching. According to King, et al. (2010), “address matching is the process of generating geographic coordinates (geocoding) for a dataset that contains postal addresses”. The attribute table “Addresses” contained the zip code and other associated information but not the geographic coordinates for each address. From the “Tools” toolbar, we used the “Geocode Addresses…” option, selected the “Roads” shapefile as our address locator and matched it with the “Addresses” table. The result was a new shapefile “Geocoded Addresses” containing all the records from the attribute table assigned with geographic coordinates.
We also have been introduced to two new spatial overlays, union and merge. In the first part of the lesson we used Union, an operation that preserves in the output shapefile all the attributes from the input shapefiles. We performed a union overlay for the “geology” and “soils” layers and obtained a shapefile that contains both the radon potential attribute for geology and for soil. Once we had both radon potential attributes in one table, we determined the areas that had the highest radon potential.
The first map illustrates the high potential radon zones and the 118 geocoded addresses resulted from our above analysis. The map also includes one new customer, located at 203 Saw Mill Road, that we geocoded using the “Find” tool. In the “Find” dialog box, under the “Addresses” tab, we selected the “Roads” shapefile as our address locator and introduced the address of our new customer. In the result list, just one location had a score of 100, and we added a graphic point mark its location.
The thematic is created after a 3 classes quantile classification, with the zone divided into low, medium and high potential of radon gas. Looking at the map we noticed that most of the homes are situated in areas with high radon gas potential.
Creating the map from Figure 1 helped learn more about legends. I wanted to add the graphic point representing the new customer to the legend and found two possible methods. The first method is to create a .dbf table with the x and y coordinates of the point and export it as a shapefile and then add it to the legend. The second method is to select the created legend, choose the “Covert to Graphics” and the star point and a label. The first method is the preferred method but I opted for the second one for simplicity reasons.
Click on the image to enlarge.
For the “Try This!” exercise we were provided with six new customers that performed the radon test at their homes. We followed the same steps like above and performed an address geocoding for the new customers. Five out of the six customers had an address match, while the address at 485 Arbor Way was unmatched. I used the Find tool to look Arbor Way and used the Identify tool to find the correct zip code for this address. I used the “Review/Rematch Addresses” tool, where I introduced the address with a correct zip code and obtained a match with a score of 100.
To verify my address matching I used the Find and Identify tool.
For the “Try This!” exercise we were provided with six new customers that performed the radon test at their homes. We followed the same steps like above and performed an address geocoding for the new customers. Five out of the six customers had an address match, while the address at 485 Arbor Way was unmatched. I used the Find tool to look Arbor Way and used the Identify tool to find the correct zip code for this address. I used the “Review/Rematch Addresses” tool, where I introduced the address with a correct zip code and obtained a match with a score of 100.
To verify my address matching I used the Find and Identify tool.
Click on the image to enlarge.
From the Find dialog box, we see that our address is placed on the right side of the street. Also in the Find dialog box we see that the addresses range are LeftFrom 300 to LeftTo 498 and RightFrom 301 to RightTo 499. This means that our address 485 Arbor Way is located on the right side of the centerline of the road at user specified offset distance and closer to the “To” node. The identify tool shows us that our geocoding performed on this address is correct.
In the “Try This !” exercise I used the second overlay method , merge. Merge is similar to union, combines all the input features into an output file. The difference between the two operations is that union performs a geometric intersection of the input features, while merge can combine geometric features or tables as long as the input data is of the same type.
Below is the merged table of the 118 geocoded customers with the added 6 new geocoded customers.
From the Find dialog box, we see that our address is placed on the right side of the street. Also in the Find dialog box we see that the addresses range are LeftFrom 300 to LeftTo 498 and RightFrom 301 to RightTo 499. This means that our address 485 Arbor Way is located on the right side of the centerline of the road at user specified offset distance and closer to the “To” node. The identify tool shows us that our geocoding performed on this address is correct.
In the “Try This !” exercise I used the second overlay method , merge. Merge is similar to union, combines all the input features into an output file. The difference between the two operations is that union performs a geometric intersection of the input features, while merge can combine geometric features or tables as long as the input data is of the same type.
Below is the merged table of the 118 geocoded customers with the added 6 new geocoded customers.
Click on the image to enlarge.
The map below represents the 6 new geocoded customers and the other 118 already geocoded customers.
The map below represents the 6 new geocoded customers and the other 118 already geocoded customers.
Click on the image to enlarge.
Summary
This exercise provided a good inside of how geocoding works. The possibility to add geographic coordinates to data in attribute table is invaluable. Geocoding is an important GIS tool and has many applications. Besides being used to locate customers and to base marketing decisions on, address geocoding is used to create route for emergency responses, to create delivery routes, by web services to offer directions, etc.
Source
King, B., & Zeiders, M. (2009). Problem-Solving with GIS, Lesson 4, Part I,IV, Concept Gallery. The Pennsylvania State University, World Campus. Retrieved February 10, 2010.
Environmental Protection Agency (2010). Why is radon the public health risk that it is? Retrieved February 10, 2010.
Summary
This exercise provided a good inside of how geocoding works. The possibility to add geographic coordinates to data in attribute table is invaluable. Geocoding is an important GIS tool and has many applications. Besides being used to locate customers and to base marketing decisions on, address geocoding is used to create route for emergency responses, to create delivery routes, by web services to offer directions, etc.
Source
King, B., & Zeiders, M. (2009). Problem-Solving with GIS, Lesson 4, Part I,IV, Concept Gallery. The Pennsylvania State University, World Campus. Retrieved February 10, 2010.
Environmental Protection Agency (2010). Why is radon the public health risk that it is? Retrieved February 10, 2010.