Releases: CityOfPhiladelphia/address-batch-geocoder
v2.0.0
address-geocoder
A tool to standardize and geocode Philadelphia addresses
Address Geocoder takes an input file containing addresses
and adds latitude and longitude to those addresses, as well as any optional
fields that the user supplies.
How The Geocoder Works
Address-Geocoder processes a csv file with addresses, and geolocates those addresses using the following steps:
- Takes an input file of addresses, and standardizes those addresses using
passyunk, Philadelphia's address standardization system. - Compares the standardized data to a local parquet file,
addresses.parquet, and adds the user-specified fields as well as latitude and longitude from that file - Not all records will match to the address file. For those records that do not match,
Address-Batch-Geocoderqueries the Address Information System (AIS) API and adds returned fields. Please note that this process can take some time, so processing large files with a messy address field is not recommended. As an example, if you have a file that needs 1,000 rows to be sent to AIS, this will take approximately 3-4 minutes. - Records that don't match to the AIS API are then queried against TomTom, which has different address parsing capabilities and is also able to return
- Records that successfully match to TomTom are then rerun against AIS to try to recover enrichment fields, if those addresses are in philly
- The enriched file is then saved to the same directory as the input file.
The release executable of the address geocoder automatically checks an s3 bucket for an updated version of the address file. The address file is published to s3 via airflow, using this DAG configuration: https://github.com/CityOfPhiladelphia/databridge-airflow-v2-configs/blob/main/citygeo/address_service_area_summary_public.yml.
Questions?
If you have questions about the geocoder that this FAQ cannot answer, feel free to contact citygeo at: maps@phila.gov
1. Prerequisites
You will need the following things:
- An executable file called
geocoder.exe. This is used to run the program. Do not save the executable in a folder that has spaces in the name. - An AIS API key, provided to you by CityGeo.
Installation
First, you will need to download and install the geocoder.
The geocoder file can be downloaded from GitHub. The latest release can be found at: https://github.com/CityOfPhiladelphia/address-geocoder/releases/
Read through the notes carefully, and then download the zip file at the bottom of the readme. You may get a dangerous file blocked warning from Chrome. Override this block and download anyway.
Extract the zip folder into a folder where you can easily find it. When opening the zipped file, you may be prompted to either extract or run. Hit extract, not run, as the script will need to exist in an uncompressed directory in order to create the subfolders needed to work.
The folder must not have spaces in its name. The zip folder contains two files: geocoder.exe and release.txt. If you delete or rename these files, you will need to download them again or rename them back. Deleting release.txt will stop the program from being able to inform you if there is a new version of the .exe file that you need to download.
Double-clicking geocoder.exe will launch the program. You may see a popup that says "Windows protected your PC." This file is safe, so bypass this protection by clicking More info, and then selecting Run anyway.
As a first-time installation, the script will download Python and Git if not present, then download the geocoder from GitHub and install the proper dependencies. The geocoder will be downloaded to a folder called address-geocoder-main. If there are problems with your install, you may try deleting this folder and running geocoder.exe again.
Note that this script will attempt to install Python 3.11 on your machine if you do not have Python 3.11 installed on your machine.
The script will then attempt to download the address file. This may take a few minutes. It will save the address file and a version file in a subfolder called geocoder_address_data. Under most circumstances, you should not remove this folder or any of the files in it. Doing so will cause the script to redownload the address file.
After the installation runs successfully, you are ready to set up the configuration file.
2. How to Use Address Geocoder
Three Ways to Use the Address Geocoder
There are three ways to use the address geocoder:
- A locally-hosted web app with a graphical user interface.
- Configuring a .yml file.
- Advanced: Linux or MacOS users, running a python command.
2.1 The Graphical User Interface (GUI)
After checking for updates, geocoder.exe will prompt the user with two run options:
Choose an option:
[1] Run with the user-interface
[2] Run with the .yml config
[Any other key]: exit:
Press 1 to use the user interface. A window will open up in your default browser.
The user interface has the following fields.
- The AIS API key. Required. Enter the AIS API key provided to you by CityGeo.
- The CSV upload option. This is where you upload the file that you wish to enrich.
- The SRIDs field: Choose which SRIDs you wish to geocode in. Required.
- The enrichment fields: Choose which optional fields to add to your data. Optional.
- Config file upload. If you don't want to select the same options very time (which can be tedious), you can optionally upload a pre-saved configuration file. You may additionally save the configuration you've chosen for future use, as well.
Once the required fields are entered, a geocode button will appear. You can geocode the file. Please do not close the browser while the geocoder is working, as you will be unable to download the results.
To close the geocoder, you will need to close both the browser window and the terminal window running geocoder.exe.
2.2 Yaml File Config
After checking for updates, geocoder.exe will prompt the user with two run options:
Choose an option:
[1] Run with the user-interface
[2] Run with the .yml config
[Any other key]: exit:
Press 2 to use the .yml config method. Before using this method, ensure that you have set up your configuration file. By default,
Address Geocoder searches for a file named config.yml. Detailed steps for filling out the config file are in the next section.
Configuration
- The script should make a config.yml file if no config.yml file exists. If the script did not do this, you can simply copy
config_example.ymltoconfig.ymleither in the file explorer by running in the terminal:
cp config_example.yml config.yml
Do not delete, rename, or move config_example.yml. If you delete this file, you will need to redownload it from GitHub.
In most cases, it is not recommended to delete, rename, or move config.yml. If you rename this file, the geocoder will be unable to find it and will create a new config.yml.
- Open up config.yml, and add your AIS API Key here:
AIS_API_KEY:
- Add the filepath for the input file (the file that you wish to enrich), and the address file. The address file should have been automatically downloaded by
geocoder.exe, and the correct path should be in the config file by default. This should look something like this. If using relative filepaths, filepaths are relative to the address-geocoder-main folder downloaded from GitHub. For ease of use, exact filepaths are recommended. Do not put the filenames in quotes:
input_file: ./data/example_input_4.csv
address_file: ./geocoder_address_data/address_service_area_summary.parquet
- Map the address fields to the name of the fields in the csv that you wish to process. If you have one combined address field, map it to full_address_field. Otherwise, leave full_address_field blank and map column names to street, city, state, and zip. Street must be included, while the others are optional.
Example, for a csv with the following fields:
addr_st, addr_city, addr_zip
input_file: 'example.csv'
full_address_field:
address_fields:
street_address: addr_st
city: addr_city
state:
zip: addr_zip
If you have both full_address_field and the address fields filled in, the script will ask you which to use.
- List which fields other than latitude and longitude you want to add.
(Latitude and longitude will always be added.) If you enter an invalid field, the program will error out and ask you to try again.
A complete list of valid fields can be found further down in this README.
enrichment_fields:
- census_tract_2020
- census_block_group_2020
- census_block_2020
- List which SRIDs should be returned. SRID refers to the format of the coordinate system. There are two options: 4326 and 2272. 4326 is the WGS84 standard, and will be output as
geocode_latandgeocode_lonand 2272 Southern Pennsylvania Projection and is output asgeocode_xandgeocode_y.
# Which SRIDs to return for geocoding
srid_4326: true
srid_2272: true
The full config file should look something like this:
# Connection Credentials
AIS_API_KEY: YOUR_API_KEY
# File Config
input_file: ./data/example_input_4.csv
address_file: ./data/addresses.parquet
full_address_field: address
# OR, IF ADDRESS IS SPLIT INTO MULTIPLE COLUMNS:
address_fields:
street_address:
city:
state:
zip:
# Enrichment Fields -- Aside from coordinates, what fields to add
enrichment_fields:
- census_tract_2020
- census_block_group_2020
- census_block_2020
- You're now ready to run the geocoder.
Double-click geocoder.exe -- the same file that you used to instal geocoder.
(If you get an error about a missing package, this means something didn't install properly. Try remov...
v1.1.8
address-geocoder
A tool to standardize and geocode Philadelphia addresses
Address Geocoder takes an input file containing addresses
and adds latitude and longitude to those addresses, as well as any optional
fields that the user supplies.
Note:
For more information about the geocoder, consult the GitHub repository: https://github.com/CityOfPhiladelphia/address-geocoder. The README in this repo contains more details about the matching process, and information about how to run the geocoder from the command line, if desired.
Questions?
If you have questions about the geocoder that this FAQ cannot answer, feel free to contact citygeo at: maps@phila.gov
1. Prerequisites
You will need the following things:
- An executable file called
geocoder.exe. This is used to run the program. Do not save the executable in a folder that has spaces in the name. - An AIS API key, provided to you by CityGeo.
Installation
First, you will need to download and install the geocoder.
The geocoder file can be downloaded from GitHub. The latest release can be found at: https://github.com/CityOfPhiladelphia/address-geocoder/releases/
Read through the notes carefully, and then download the zip file at the bottom of the readme.
Extract the zip folder into a folder where you can easily find it. When opening the zipped file, you may be prompted to either extract or run. Hit extract, not run, as the script will need to exist in an uncompressed directory in order to create the subfolders needed to work.
The folder must not have spaces in its name. The zip folder contains two files: geocoder.exe and release.txt. If you delete or rename these files, you will need to download them again or rename them back. Deleting release.txt will stop the program from being able to inform you if there is a new version of the .exe file that you need to download.
Double-clicking geocoder.exe will launch the program. As a first-time installation, the script will download Python and Git if not present, then download the geocoder from GitHub and install the proper dependencies. The geocoder will be downloaded to a folder called address-geocoder-main. If there are problems with your install, you may try deleting this folder and running geocoder.exe again.
Note that this script will attempt to install Python 3.10 on your machine if you do not have Python 3.10 installed on your machine.
The script will then attempt to download the address file. This may take a few minutes. It will save the address file and a version file in a subfolder called geocoder_address_data. Under most circumstances, you should not remove this folder or any of the files in it. Doing so will cause the script to redownload the address file.
After the installation runs successfully, you are ready to set up the configuration file.
2. How to Use Address Geocoder
In order to run Address Geocoder, first set up the configuration file. By default,
Address Geocoder searchers for a file named config.yml. Detailed steps for filling out the config file are in the next section.
Configuration
- The script should make a config.yml file if no config.yml file exists. If the script did not do this, you can simply copy
config_example.ymltoconfig.ymleither in the file explorer by running in the terminal:
cp config_example.yml config.yml
Do not delete, rename, or move config_example.yml. If you delete this file, you will need to redownload it from GitHub.
In most cases, it is not recommended to delete, rename, or move config.yml. If you rename this file, the geocoder will be unable to find it and will create a new config.yml.
- Open up config.yml, and add your AIS API Key here:
AIS_API_KEY:
- Add the filepath for the input file (the file that you wish to enrich), and the geography file (the address file you have been given.) This should look something like this. If using relative filepaths, filepaths are relative to the address-geocoder-main folder downloaded from GitHub. For ease of use, exact filepaths are recommended. Do not put the filenames in quotes:
input_file: ./data/example_input_4.csv
geography_file: ./data/addresses.parquet
- Map the address fields to the name of the fields in the csv that you wish to process.
If you have one combined address field, map it to full_address_field.
Otherwise, leave full_address_field blank and map column names to street, city, state, and zip. Street must be included, while the others are optional.
Example, for a csv with the following fields:
addr_st, addr_city, addr_zip
input_file: 'example.csv'
full_address_field:
address_fields:
street: addr_st
city: addr_city
state:
zip: addr_zip
If you have both full_address_field and the address fields filled in, the script will ask you which to use.
- List which fields other than latitude and longitude you want to add.
(Latitude and longitude will always be added.) If you enter an invalid field, the program will error out and ask you to try again.
A complete list of valid fields can be found further down in this README.
enrichment_fields:
- census_tract_2020
- census_block_group_2020
- census_block_2020
- List which SRIDs should be returned. SRID refers to the format of the coordinate system. There are two options: 4326 and 2272. 4326 is the WGS84 standard, and will be output as
geocode_latandgeocode_lonand 2272 Southern Pennsylvania Projection and is output asgeocode_xandgeocode_y.
# Which SRIDs to return for geocoding
srid_4326: true
srid_2272: true
The full config file should look something like this:
# Connection Credentials
AIS_API_KEY: YOUR_API_KEY
# File Config
input_file: ./data/example_input_4.csv
geography_file: ./data/addresses.parquet
full_address_field: address
# OR, IF ADDRESS IS SPLIT INTO MULTIPLE COLUMNS:
address_fields:
street:
city:
state:
zip:
# Enrichment Fields -- Aside from coordinates, what fields to add
enrichment_fields:
- census_tract_2020
- census_block_group_2020
- census_block_2020
- You're now ready to run the geocoder.
Double-click geocoder.exe -- the same file that you used to instal geocoder.
(If you get an error about a missing package, this means something didn't install properly. Try removing the address-geocoder-main folder and try again.)
The dialogue will ask you to specify a config file. Hit enter without typing anything to
keep the default config file ('./config.yml')
The output file will be saved in the same location as your input file, with _enriched attached to the filename.
Note that you may see various warnings about a USPS and election file not being found, and about SSL certification. This is to be expected.
One of the steps of the enrichment process is to check against Philadelphia's address information system (AIS). Please note that this process can take some time. It takes around 3-4 minutes to make 1,000 calls to AIS. Not all records will be checked against AIS -- just those that have no match in the addresses.parquet file.
So, it is important to provide an input file with as clean as an address field as possible, to minimize the number of times the script checks AIS.
How The Geocoder Works
Address-Geocoder processes a csv file with addresses, and geolocates those
addresses using the following steps:
- Takes an input file of addresses, and standardizes those
addresses usingpassyunk, Philadelphia's address standardization system. - Compares the standardized data to a local parquet file,
addresses.parquet,
and adds the user-specified fields as well as latitude and longitude from that file - Not all records will match to the address file. For those records that do not match,
Address-Geocoderqueries the Address Information System (AIS) API and adds returned fields.
Please note that this process can take some time, so processing large files with a messy address field
is not recommended. As an example, if you have a file that needs 1,000 rows to be sent to AIS, this will take
approximately 3-4 minutes. - The enriched file is then saved to the same directory as the input file.
Enrichment Fields
Field |
|---|
address_high |
address_low_frac |
address_low_suffix |
address_low |
bin |
census_block_2010 |
census_block_2020 |
census_block_group_2010 |
census_block_group_2020 |
census_tract_2010 |
census_tract_2020 |
center_city_district |
clean_philly_block_captain |
commercial_corridor |
council_district_2016 |
council_district_2024 |
cua_zone |
dor_parcel_id |
eclipse_location_id |
elementary_school |
engine_local |
high_school |
highway_district |
highway_section |
highway_subsection |
historic_district |
historic_site |
historic_street |
ladder_local |
lane_closure |
leaf_collection_area |
li_address_key |
li_district |
major_phila_watershed |
middle_school |
neighborhood_advisory_committee |
philly_rising_area |
planning_district |
police_district |
police_division |
police_service_area |
political_division |
political_ward |
ppr_friends |
pwd_center_city_district |
pwd_maint_district |
pwd_parcel_id |
pwd_pressure_district |
pwd_treatment_plant |
pwd_water_plate |
recycling_diversion_rate |
rubbish_recycle_day |
sanitation_area |
sanitation_convenience_center |
sanitation_district |
seg_id |
state_house_rep_2012 |
state_house_rep_2022 |
state_senate_2012 |
state_senate_2022 |
street_code |
street_light_route |
street_name |
street_postdir |
street_predir |
street_suffix |
traffic_district |
traffic_pm_district |
unit_num |
unit_type |
us_congressional_2012 |
| `us_congression... |
v1.1.7
address-batch-geocoder
A tool to standardize and geocode Philadelphia addresses
Address Geocoder takes an input file containing addresses
and adds latitude and longitude to those addresses, as well as any optional
fields that the user supplies.
Note:
For more information about the geocoder, consult the GitHub repository: https://github.com/CityOfPhiladelphia/address-batch-geocoder. The README in this repo contains more details about the matching process, and information about how to run the geocoder from the command line, if desired.
Questions?
If you have questions about the geocoder that this FAQ cannot answer, feel free to contact citygeo at: maps@phila.gov
1. Prerequisites
You will need the following things:
- An executable file called
geocoder.exe. This is used to run the program. Do not save the executable in a folder that has spaces in the name. - An AIS API key, provided to you by CityGeo. If you don't have one, please submit a ticket to ithelp@phila.gov.
Installation
First, you will need to download and install the geocoder.
The geocoder file can be downloaded from GitHub. The latest release can be found at: https://github.com/CityOfPhiladelphia/address-batch-geocoder/releases/
Read through the notes carefully, and then download the zip file at the bottom of the readme.
Extract the zip folder into a folder where you can easily find it. The folder must not have spaces in its name. The zip folder contains two files: geocoder.exe and release.txt. If you delete or rename these files, you will need to download them again or rename them back. Deleting release.txt will stop the program from being able to inform you if there is a new version of the .exe file that you need to download.
Double-clicking geocoder.exe will launch the program. As a first-time installation, the script will download Python and Git if not present, then download the geocoder from GitHub and install the proper dependencies. The geocoder will be downloaded to a folder called address-geocoder-main. If there are problems with your install, you may try deleting this folder and running geocoder.exe again.
Note that this script will attempt to install Python 3.10 on your machine if you do not have Python 3.10 installed on your machine.
The script will then attempt to download the address file. This may take a few minutes. It will save the address file and a version file in a subfolder called geocoder_address_data. Under most circumstances, you should not remove this folder or any of the files in it. Doing so will cause the script to redownload the address file.
After the installation runs successfully, you are ready to set up the configuration file.
2. How to Use Address Geocoder
In order to run Address Geocoder, first set up the configuration file. By default,
Address Geocoder searchers for a file named config.yml. Detailed steps for filling out the config file are in the next section.
Configuration
- The script should make a config.yml file if no config.yml file exists. If the script did not do this, you can simply copy
config_example.ymltoconfig.ymleither in the file explorer by running in the terminal:
cp config_example.yml config.yml
Do not delete, rename, or move config_example.yml. If you delete this file, you will need to redownload it from GitHub.
In most cases, it is not recommended to delete, rename, or move config.yml. If you rename this file, the geocoder will be unable to find it and will create a new config.yml.
- Open up config.yml, and add your AIS API Key here:
AIS_API_KEY:
- Add the filepath for the input file (the file that you wish to enrich), and the geography file (the address file you have been given.) This should look something like this. If using relative filepaths, filepaths are relative to the address-geocoder-main folder downloaded from GitHub. For ease of use, exact filepaths are recommended. Do not put the filenames in quotes:
input_file: ./data/example_input_4.csv
geography_file: ./data/addresses.parquet
- Map the address fields to the name of the fields in the csv that you wish to process.
If you have one combined address field, map it to full_address_field.
Otherwise, leave full_address_field blank and map column names to street, city, state, and zip. Street must be included, while the others are optional.
Example, for a csv with the following fields:
addr_st, addr_city, addr_zip
input_file: 'example.csv'
full_address_field:
address_fields:
street: addr_st
city: addr_city
state:
zip: addr_zip
If you have both full_address_field and the address fields filled in, the script will ask you which to use.
- List which fields other than latitude and longitude you want to add.
(Latitude and longitude will always be added.) If you enter an invalid field, the program will error out and ask you to try again.
Enrichment fields will only be populated for Philadelphia addresses that match to the address file, or are geocoded by the AIS API.
A complete list of valid fields can be found further down in this README.
enrichment_fields:
- census_tract_2020
- census_block_group_2020
- census_block_2020
The full config file should look something like this:
# Connection Credentials
AIS_API_KEY: YOUR_API_KEY
# File Config
input_file: ./data/example_input_4.csv
geography_file: ./data/addresses.parquet
full_address_field: address
# OR, IF ADDRESS IS SPLIT INTO MULTIPLE COLUMNS:
address_fields:
street:
city:
state:
zip:
# Enrichment Fields -- Aside from coordinates, what fields to add
enrichment_fields:
- census_tract_2020
- census_block_group_2020
- census_block_2020
- You're now ready to run the geocoder.
Double-click geocoder.exe -- the same file that you used to instal geocoder.
(If you get an error about a missing package, this means something didn't install properly. Try removing the address-geocoder-main folder and try again.)
The dialogue will ask you to specify a config file. Hit enter without typing anything to
keep the default config file ('./config.yml')
The output file will be saved in the same location as your input file, with _enriched attached to the filename.
Note that you may see various warnings about a USPS and election file not being found, and about SSL certification. This is to be expected.
One of the steps of the enrichment process is to check against Philadelphia's address information system (AIS). Please note that this process can take some time. It takes around 3-4 minutes to make 1,000 calls to AIS. Not all records will be checked against AIS -- just those that have no match in the addresses.parquet file.
So, it is important to provide an input file with as clean as an address field as possible, to minimize the number of times the script checks AIS.
How The Geocoder Works
Address-Batch-Geocoder processes a csv file with addresses, and geolocates those
addresses using the following steps:
- Takes an input file of addresses, and standardizes those
addresses usingpassyunk, Philadelphia's address standardization system. - Compares the standardized data to a local parquet file,
addresses.parquet,
and adds the user-specified fields as well as latitude and longitude from that file - Not all records will match to the address file. For those records that do not match,
Address-Batch-Geocoderqueries the Address Information System (AIS) API and adds returned fields.
Please note that this process can take some time, so processing large files with a messy address field
is not recommended. As an example, if you have a file that needs 1,000 rows to be sent to AIS, this will take
approximately 3-4 minutes. - The enriched file is then saved to the same directory as the input file.
Enrichment Fields
Field |
|---|
address_high |
address_low_frac |
address_low_suffix |
address_low |
bin |
census_block_2010 |
census_block_2020 |
census_block_group_2010 |
census_block_group_2020 |
census_tract_2010 |
census_tract_2020 |
center_city_district |
clean_philly_block_captain |
commercial_corridor |
council_district_2016 |
council_district_2024 |
cua_zone |
dor_parcel_id |
eclipse_location_id |
elementary_school |
engine_local |
high_school |
highway_district |
highway_section |
highway_subsection |
historic_district |
historic_site |
historic_street |
ladder_local |
lane_closure |
leaf_collection_area |
li_address_key |
li_district |
major_phila_watershed |
middle_school |
neighborhood_advisory_committee |
philly_rising_area |
planning_district |
police_district |
police_division |
police_service_area |
political_division |
political_ward |
ppr_friends |
pwd_center_city_district |
pwd_maint_district |
pwd_parcel_id |
pwd_pressure_district |
pwd_treatment_plant |
pwd_water_plate |
recycling_diversion_rate |
rubbish_recycle_day |
sanitation_area |
sanitation_convenience_center |
sanitation_district |
seg_id |
state_house_rep_2012 |
state_house_rep_2022 |
state_senate_2012 |
state_senate_2022 |
street_code |
street_light_route |
street_name |
street_postdir |
street_predir |
street_suffix |
traffic_district |
traffic_pm_district |
unit_num |
unit_type |
us_congressional_2012 |
us_congressional_2018 |
us_congressional_2022 |
zip_4 |
zip_code |
zoning_document_ids |
zoning_rco |
zoning |
What's Changed
- Db append by @CaitlinCP in #1
- Write unit tests for ais lookup by @CaitlinCP in #2
- Local append by @CaitlinCP in htt...
v1.1.6
address-geocoder
A tool to standardize and geocode Philadelphia addresses
Address Geocoder takes an input file containing addresses
and adds latitude and longitude to those addresses, as well as any optional
fields that the user supplies.
Note:
For more information about the geocoder, consult the GitHub repository: https://github.com/CityOfPhiladelphia/address-geocoder. The README in this repo contains more details about the matching process, and information about how to run the geocoder from the command line, if desired.
Questions?
If you have questions about the geocoder that this FAQ cannot answer, feel free to contact citygeo at: maps@phila.gov
1. Prerequisites
You will need the following things:
- An executable file called
geocoder.exe. This is used to run the program. Do not save the executable in a folder that has spaces in the name. - An AIS API key, provided to you by CityGeo.
Installation
First, you will need to download and install the geocoder.
The geocoder file can be downloaded from GitHub. The latest release can be found at: https://github.com/CityOfPhiladelphia/address-geocoder/releases/
Read through the notes carefully, and then download the zip file at the bottom of the readme.
Extract the zip folder into a folder where you can easily find it. The folder must not have spaces in its name. The zip folder contains two files: geocoder.exe and release.txt. If you delete or rename these files, you will need to download them again or rename them back. Deleting release.txt will stop the program from being able to inform you if there is a new version of the .exe file that you need to download.
Double-clicking geocoder.exe will launch the program. As a first-time installation, the script will download Python and Git if not present, then download the geocoder from GitHub and install the proper dependencies. The geocoder will be downloaded to a folder called address-geocoder-main. If there are problems with your install, you may try deleting this folder and running geocoder.exe again.
Note that this script will attempt to install Python 3.10 on your machine if you do not have Python 3.10 installed on your machine.
The script will then attempt to download the address file. This may take a few minutes. It will save the address file and a version file in a subfolder called geocoder_address_data. Under most circumstances, you should not remove this folder or any of the files in it. Doing so will cause the script to redownload the address file.
After the installation runs successfully, you are ready to set up the configuration file.
2. How to Use Address Geocoder
In order to run Address Geocoder, first set up the configuration file. By default,
Address Geocoder searchers for a file named config.yml. Detailed steps for filling out the config file are in the next section.
Configuration
- The script should make a config.yml file if no config.yml file exists. If the script did not do this, you can simply copy
config_example.ymltoconfig.ymleither in the file explorer by running in the terminal:
cp config_example.yml config.yml
Do not delete, rename, or move config_example.yml. If you delete this file, you will need to redownload it from GitHub.
In most cases, it is not recommended to delete, rename, or move config.yml. If you rename this file, the geocoder will be unable to find it and will create a new config.yml.
- Open up config.yml, and add your AIS API Key here:
AIS_API_KEY:
- Add the filepath for the input file (the file that you wish to enrich), and the geography file (the address file you have been given.) This should look something like this. If using relative filepaths, filepaths are relative to the address-geocoder-main folder downloaded from GitHub. For ease of use, exact filepaths are recommended. Do not put the filenames in quotes:
input_file: ./data/example_input_4.csv
geography_file: ./data/addresses.parquet
- Map the address fields to the name of the fields in the csv that you wish to process.
If you have one combined address field, map it to full_address_field.
Otherwise, leave full_address_field blank and map column names to street, city, state, and zip. Street must be included, while the others are optional.
Example, for a csv with the following fields:
addr_st, addr_city, addr_zip
input_file: 'example.csv'
full_address_field:
address_fields:
street: addr_st
city: addr_city
state:
zip: addr_zip
If you have both full_address_field and the address fields filled in, the script will ask you which to use.
- List which fields other than latitude and longitude you want to add.
(Latitude and longitude will always be added.) If you enter an invalid field, the program will error out and ask you to try again.
A complete list of valid fields can be found further down in this README.
enrichment_fields:
- census_tract_2020
- census_block_group_2020
- census_block_2020
The full config file should look something like this:
# Connection Credentials
AIS_API_KEY: YOUR_API_KEY
# File Config
input_file: ./data/example_input_4.csv
geography_file: ./data/addresses.parquet
full_address_field: address
# OR, IF ADDRESS IS SPLIT INTO MULTIPLE COLUMNS:
address_fields:
street:
city:
state:
zip:
# Enrichment Fields -- Aside from coordinates, what fields to add
enrichment_fields:
- census_tract_2020
- census_block_group_2020
- census_block_2020
- You're now ready to run the geocoder.
Double-click geocoder.exe -- the same file that you used to instal geocoder.
(If you get an error about a missing package, this means something didn't install properly. Try removing the address-geocoder-main folder and try again.)
The dialogue will ask you to specify a config file. Hit enter without typing anything to
keep the default config file ('./config.yml')
The output file will be saved in the same location as your input file, with _enriched attached to the filename.
Note that you may see various warnings about a USPS and election file not being found, and about SSL certification. This is to be expected.
One of the steps of the enrichment process is to check against Philadelphia's address information system (AIS). Please note that this process can take some time. It takes around 3-4 minutes to make 1,000 calls to AIS. Not all records will be checked against AIS -- just those that have no match in the addresses.parquet file.
So, it is important to provide an input file with as clean as an address field as possible, to minimize the number of times the script checks AIS.
How The Geocoder Works
Address-Geocoder processes a csv file with addresses, and geolocates those
addresses using the following steps:
- Takes an input file of addresses, and standardizes those
addresses usingpassyunk, Philadelphia's address standardization system. - Compares the standardized data to a local parquet file,
addresses.parquet,
and adds the user-specified fields as well as latitude and longitude from that file - Not all records will match to the address file. For those records that do not match,
Address-Geocoderqueries the Address Information System (AIS) API and adds returned fields.
Please note that this process can take some time, so processing large files with a messy address field
is not recommended. As an example, if you have a file that needs 1,000 rows to be sent to AIS, this will take
approximately 3-4 minutes. - The enriched file is then saved to the same directory as the input file.
Enrichment Fields
Field |
|---|
address_high |
address_low_frac |
address_low_suffix |
address_low |
bin |
census_block_2010 |
census_block_2020 |
census_block_group_2010 |
census_block_group_2020 |
census_tract_2010 |
census_tract_2020 |
center_city_district |
clean_philly_block_captain |
commercial_corridor |
council_district_2016 |
council_district_2024 |
cua_zone |
dor_parcel_id |
eclipse_location_id |
elementary_school |
engine_local |
high_school |
highway_district |
highway_section |
highway_subsection |
historic_district |
historic_site |
historic_street |
ladder_local |
lane_closure |
leaf_collection_area |
li_address_key |
li_district |
major_phila_watershed |
middle_school |
neighborhood_advisory_committee |
philly_rising_area |
planning_district |
police_district |
police_division |
police_service_area |
political_division |
political_ward |
ppr_friends |
pwd_center_city_district |
pwd_maint_district |
pwd_parcel_id |
pwd_pressure_district |
pwd_treatment_plant |
pwd_water_plate |
recycling_diversion_rate |
rubbish_recycle_day |
sanitation_area |
sanitation_convenience_center |
sanitation_district |
seg_id |
state_house_rep_2012 |
state_house_rep_2022 |
state_senate_2012 |
state_senate_2022 |
street_code |
street_light_route |
street_name |
street_postdir |
street_predir |
street_suffix |
traffic_district |
traffic_pm_district |
unit_num |
unit_type |
us_congressional_2012 |
us_congressional_2018 |
us_congressional_2022 |
zip_4 |
zip_code |
zoning_document_ids |
zoning_rco |
zoning |
Full Changelog: v1.1.4...v1.1.6
v1.1.5
address-geocoder
A tool to standardize and geocode Philadelphia addresses
Address Geocoder takes an input file containing addresses
and adds latitude and longitude to those addresses, as well as any optional
fields that the user supplies.
Note:
For more information about the geocoder, consult the GitHub repository: https://github.com/CityOfPhiladelphia/address-geocoder. The README in this repo contains more details about the matching process, and information about how to run the geocoder from the command line, if desired.
Questions?
If you have questions about the geocoder that this FAQ cannot answer, feel free to contact citygeo at: maps@phila.gov
1. Prerequisites
You will need the following things:
- An executable file called
geocoder.exe. This is used to run the program. Do not save the executable in a folder that has spaces in the name. - An AIS API key, provided to you by CityGeo.
Installation
First, you will need to download and install the geocoder.
The geocoder file can be downloaded from GitHub. The latest release can be found at: https://github.com/CityOfPhiladelphia/address-geocoder/releases/
Read through the notes carefully, and then download the zip file at the bottom of the readme.
Extract the zip folder into a folder where you can easily find it. The folder must not have spaces in its name. The zip folder contains two files: geocoder.exe and release.txt. If you delete or rename these files, you will need to download them again or rename them back. Deleting release.txt will stop the program from being able to inform you if there is a new version of the .exe file that you need to download.
Double-clicking geocoder.exe will launch the program. As a first-time installation, the script will download Python and Git if not present, then download the geocoder from GitHub and install the proper dependencies. The geocoder will be downloaded to a folder called address-geocoder-main. If there are problems with your install, you may try deleting this folder and running geocoder.exe again.
Note that this script will attempt to install Python 3.10 on your machine if you do not have Python 3.10 installed on your machine.
The script will then attempt to download the address file. This may take a few minutes. It will save the address file and a version file in a subfolder called geocoder_address_data. Under most circumstances, you should not remove this folder or any of the files in it. Doing so will cause the script to redownload the address file.
After the installation runs successfully, you are ready to set up the configuration file.
2. How to Use Address Geocoder
In order to run Address Geocoder, first set up the configuration file. By default,
Address Geocoder searchers for a file named config.yml. Detailed steps for filling out the config file are in the next section.
Configuration
- The script should make a config.yml file if no config.yml file exists. If the script did not do this, you can simply copy
config_example.ymltoconfig.ymleither in the file explorer by running in the terminal:
cp config_example.yml config.yml
Do not delete, rename, or move config_example.yml. If you delete this file, you will need to redownload it from GitHub.
In most cases, it is not recommended to delete, rename, or move config.yml. If you rename this file, the geocoder will be unable to find it and will create a new config.yml.
- Open up config.yml, and add your AIS API Key here:
AIS_API_KEY:
- Add the filepath for the input file (the file that you wish to enrich), and the geography file (the address file you have been given.) This should look something like this. If using relative filepaths, filepaths are relative to the address-geocoder-main folder downloaded from GitHub. For ease of use, exact filepaths are recommended. Do not put the filenames in quotes:
input_file: ./data/example_input_4.csv
geography_file: ./data/addresses.parquet
- Map the address fields to the name of the fields in the csv that you wish to process.
If you have one combined address field, map it to full_address_field.
Otherwise, leave full_address_field blank and map column names to street, city, state, and zip. Street must be included, while the others are optional.
Example, for a csv with the following fields:
addr_st, addr_city, addr_zip
input_file: 'example.csv'
full_address_field:
address_fields:
street: addr_st
city: addr_city
state:
zip: addr_zip
If you have both full_address_field and the address fields filled in, the script will ask you which to use.
- List which fields other than latitude and longitude you want to add.
(Latitude and longitude will always be added.) If you enter an invalid field, the program will error out and ask you to try again.
A complete list of valid fields can be found further down in this README.
enrichment_fields:
- census_tract_2020
- census_block_group_2020
- census_block_2020
The full config file should look something like this:
# Connection Credentials
AIS_API_KEY: YOUR_API_KEY
# File Config
input_file: ./data/example_input_4.csv
geography_file: ./data/addresses.parquet
full_address_field: address
# OR, IF ADDRESS IS SPLIT INTO MULTIPLE COLUMNS:
address_fields:
street:
city:
state:
zip:
# Enrichment Fields -- Aside from coordinates, what fields to add
enrichment_fields:
- census_tract_2020
- census_block_group_2020
- census_block_2020
- You're now ready to run the geocoder.
Double-click geocoder.exe -- the same file that you used to instal geocoder.
(If you get an error about a missing package, this means something didn't install properly. Try removing the address-geocoder-main folder and try again.)
The dialogue will ask you to specify a config file. Hit enter without typing anything to
keep the default config file ('./config.yml')
The output file will be saved in the same location as your input file, with _enriched attached to the filename.
Note that you may see various warnings about a USPS and election file not being found, and about SSL certification. This is to be expected.
One of the steps of the enrichment process is to check against Philadelphia's address information system (AIS). Please note that this process can take some time. It takes around 3-4 minutes to make 1,000 calls to AIS. Not all records will be checked against AIS -- just those that have no match in the addresses.parquet file.
So, it is important to provide an input file with as clean as an address field as possible, to minimize the number of times the script checks AIS.
How The Geocoder Works
Address-Geocoder processes a csv file with addresses, and geolocates those
addresses using the following steps:
- Takes an input file of addresses, and standardizes those
addresses usingpassyunk, Philadelphia's address standardization system. - Compares the standardized data to a local parquet file,
addresses.parquet,
and adds the user-specified fields as well as latitude and longitude from that file - Not all records will match to the address file. For those records that do not match,
Address-Geocoderqueries the Address Information System (AIS) API and adds returned fields.
Please note that this process can take some time, so processing large files with a messy address field
is not recommended. As an example, if you have a file that needs 1,000 rows to be sent to AIS, this will take
approximately 3-4 minutes. - The enriched file is then saved to the same directory as the input file.
Enrichment Fields
Field |
|---|
address_high |
address_low_frac |
address_low_suffix |
address_low |
bin |
census_block_2010 |
census_block_2020 |
census_block_group_2010 |
census_block_group_2020 |
census_tract_2010 |
census_tract_2020 |
center_city_district |
clean_philly_block_captain |
commercial_corridor |
council_district_2016 |
council_district_2024 |
cua_zone |
dor_parcel_id |
eclipse_location_id |
elementary_school |
engine_local |
high_school |
highway_district |
highway_section |
highway_subsection |
historic_district |
historic_site |
historic_street |
ladder_local |
lane_closure |
leaf_collection_area |
li_address_key |
li_district |
major_phila_watershed |
middle_school |
neighborhood_advisory_committee |
philly_rising_area |
planning_district |
police_district |
police_division |
police_service_area |
political_division |
political_ward |
ppr_friends |
pwd_center_city_district |
pwd_maint_district |
pwd_parcel_id |
pwd_pressure_district |
pwd_treatment_plant |
pwd_water_plate |
recycling_diversion_rate |
rubbish_recycle_day |
sanitation_area |
sanitation_convenience_center |
sanitation_district |
seg_id |
state_house_rep_2012 |
state_house_rep_2022 |
state_senate_2012 |
state_senate_2022 |
street_code |
street_light_route |
street_name |
street_postdir |
street_predir |
street_suffix |
traffic_district |
traffic_pm_district |
unit_num |
unit_type |
us_congressional_2012 |
us_congressional_2018 |
us_congressional_2022 |
zip_4 |
zip_code |
zoning_document_ids |
zoning_rco |
zoning |
Full Changelog: v1.1.4...v1.1.5
v1.1.4
address-geocoder
A tool to standardize and geocode Philadelphia addresses
Address Geocoder takes an input file containing addresses
and adds latitude and longitude to those addresses, as well as any optional
fields that the user supplies.
Note:
For more information about the geocoder, consult the GitHub repository: https://github.com/CityOfPhiladelphia/address-geocoder. The README in this repo contains more details about the matching process, and information about how to run the geocoder from the command line, if desired.
Questions?
If you have questions about the geocoder that this FAQ cannot answer, feel free to contact citygeo at: maps@phila.gov
1. Prerequisites
You will need the following things:
- An executable file called
geocoder.exe. This is used to run the program. Do not save the executable in a folder that has spaces in the name. - An AIS API key, provided to you by CityGeo.
Installation
First, you will need to download and install the geocoder.
The geocoder file can be downloaded from GitHub. The latest release can be found at: https://github.com/CityOfPhiladelphia/address-geocoder/releases/
Read through the notes carefully, and then download geocoder.exe at the bottom of the readme.
Once geocoder.exe is downloaded, move it into a folder where you can easily find it. The folder must not have spaces in its name.
Double-clicking geocoder.exe will launch the program. As a first-time installation, the script will download Python and Git if not present, then download the geocoder from GitHub and install the proper dependencies. The geocoder will be downloaded to a folder called address-geocoder-main. If there are problems with your install, you may try deleting this folder and running geocoder.exe again.
Note that this script will attempt to install Python 3.10 on your machine if you do not have Python 3.10 installed on your machine.
The script will then attempt to download the address file. This may take a few minutes. It will save the address file and a version file in a subfolder called geocoder_address_data. Under most circumstances, you should not remove this folder or any of the files in it. Doing so will cause the script to redownload the address file.
After the installation runs successfully, you are ready to set up the configuration file.
2. How to Use Address Geocoder
In order to run Address Geocoder, first set up the configuration file. By default,
Address Geocoder searchers for a file named config.yml. Detailed steps for filling out the config file are in the next section.
Configuration
- The script should make a config.yml file if no config.yml file exists. If the script did not do this, you can simply copy
config_example.ymltoconfig.ymleither in the file explorer by running in the terminal:
cp config_example.yml config.yml
Do not delete, rename, or move config_example.yml. If you delete this file, you will need to redownload it from GitHub.
In most cases, it is not recommended to delete, rename, or move config.yml. If you rename this file, the geocoder will be unable to find it and will create a new config.yml.
- Open up config.yml, and add your AIS API Key here:
AIS_API_KEY:
- Add the filepath for the input file (the file that you wish to enrich), and the geography file (the address file you have been given.) This should look something like this. If using relative filepaths, filepaths are relative to the address-geocoder-main folder downloaded from GitHub. For ease of use, exact filepaths are recommended. Do not put the filenames in quotes:
input_file: ./data/example_input_4.csv
geography_file: ./data/addresses.parquet
- Map the address fields to the name of the fields in the csv that you wish to process.
If you have one combined address field, map it to full_address_field.
Otherwise, leave full_address_field blank and map column names to street, city, state, and zip. Street must be included, while the others are optional.
Example, for a csv with the following fields:
addr_st, addr_city, addr_zip
input_file: 'example.csv'
full_address_field:
address_fields:
street: addr_st
city: addr_city
state:
zip: addr_zip
If you have both full_address_field and the address fields filled in, the script will ask you which to use.
- List which fields other than latitude and longitude you want to add.
(Latitude and longitude will always be added.) If you enter an invalid field, the program will error out and ask you to try again.
A complete list of valid fields can be found further down in this README.
enrichment_fields:
- census_tract_2020
- census_block_group_2020
- census_block_2020
The full config file should look something like this:
# Connection Credentials
AIS_API_KEY: YOUR_API_KEY
# File Config
input_file: ./data/example_input_4.csv
geography_file: ./data/addresses.parquet
full_address_field: address
# OR, IF ADDRESS IS SPLIT INTO MULTIPLE COLUMNS:
address_fields:
street:
city:
state:
zip:
# Enrichment Fields -- Aside from coordinates, what fields to add
enrichment_fields:
- census_tract_2020
- census_block_group_2020
- census_block_2020
- You're now ready to run the geocoder.
Double-click geocoder.exe -- the same file that you used to instal geocoder.
(If you get an error about a missing package, this means something didn't install properly. Try removing the address-geocoder-main folder and try again.)
The dialogue will ask you to specify a config file. Hit enter without typing anything to
keep the default config file ('./config.yml')
The output file will be saved in the same location as your input file, with _enriched attached to the filename.
Note that you may see various warnings about a USPS and election file not being found, and about SSL certification. This is to be expected.
One of the steps of the enrichment process is to check against Philadelphia's address information system (AIS). Please note that this process can take some time. It takes around 3-4 minutes to make 1,000 calls to AIS. Not all records will be checked against AIS -- just those that have no match in the addresses.parquet file.
So, it is important to provide an input file with as clean as an address field as possible, to minimize the number of times the script checks AIS.
How The Geocoder Works
Address-Geocoder processes a csv file with addresses, and geolocates those
addresses using the following steps:
- Takes an input file of addresses, and standardizes those
addresses usingpassyunk, Philadelphia's address standardization system. - Compares the standardized data to a local parquet file,
addresses.parquet,
and adds the user-specified fields as well as latitude and longitude from that file - Not all records will match to the address file. For those records that do not match,
Address-Geocoderqueries the Address Information System (AIS) API and adds returned fields.
Please note that this process can take some time, so processing large files with a messy address field
is not recommended. As an example, if you have a file that needs 1,000 rows to be sent to AIS, this will take
approximately 3-4 minutes. - The enriched file is then saved to the same directory as the input file.
Enrichment Fields
Field |
|---|
address_high |
address_low_frac |
address_low_suffix |
address_low |
bin |
census_block_2010 |
census_block_2020 |
census_block_group_2010 |
census_block_group_2020 |
census_tract_2010 |
census_tract_2020 |
center_city_district |
clean_philly_block_captain |
commercial_corridor |
council_district_2016 |
council_district_2024 |
cua_zone |
dor_parcel_id |
eclipse_location_id |
elementary_school |
engine_local |
high_school |
highway_district |
highway_section |
highway_subsection |
historic_district |
historic_site |
historic_street |
ladder_local |
lane_closure |
leaf_collection_area |
li_address_key |
li_district |
major_phila_watershed |
middle_school |
neighborhood_advisory_committee |
philly_rising_area |
planning_district |
police_district |
police_division |
police_service_area |
political_division |
political_ward |
ppr_friends |
pwd_center_city_district |
pwd_maint_district |
pwd_parcel_id |
pwd_pressure_district |
pwd_treatment_plant |
pwd_water_plate |
recycling_diversion_rate |
rubbish_recycle_day |
sanitation_area |
sanitation_convenience_center |
sanitation_district |
seg_id |
state_house_rep_2012 |
state_house_rep_2022 |
state_senate_2012 |
state_senate_2022 |
street_code |
street_light_route |
street_name |
street_postdir |
street_predir |
street_suffix |
traffic_district |
traffic_pm_district |
unit_num |
unit_type |
us_congressional_2012 |
us_congressional_2018 |
us_congressional_2022 |
zip_4 |
zip_code |
zoning_document_ids |
zoning_rco |
zoning |
Full Changelog: v1.1.3...v1.1.4
v1.1.3
address-geocoder
A tool to standardize and geocode Philadelphia addresses
Address Geocoder takes an input file containing addresses
and adds latitude and longitude to those addresses, as well as any optional
fields that the user supplies.
Note:
For more information about the geocoder, consult the GitHub repository: https://github.com/CityOfPhiladelphia/address-geocoder. The README in this repo contains more details about the matching process, and information about how to run the geocoder from the command line, if desired.
Questions?
If you have questions about the geocoder that this FAQ cannot answer, feel free to contact citygeo at: maps@phila.gov
1. Prerequisites
You will need the following things:
- An executable file called
geocoder.exe. This is used to run the program. Do not save the executable in a folder that has spaces in the name. - An AIS API key, provided to you by CityGeo.
Installation
First, you will need to download and install the geocoder.
The geocoder file can be downloaded from GitHub. The latest release can be found at: https://github.com/CityOfPhiladelphia/address-geocoder/releases/
Read through the notes carefully, and then download geocoder.exe at the bottom of the readme.
Once geocoder.exe is downloaded, move it into a folder where you can easily find it. The folder must not have spaces in its name.
Double-clicking geocoder.exe will launch the program. As a first-time installation, the script will download Python and Git if not present, then download the geocoder from GitHub and install the proper dependencies. The geocoder will be downloaded to a folder called address-geocoder-main. If there are problems with your install, you may try deleting this folder and running geocoder.exe again.
Note that this script will attempt to install Python 3.10 on your machine if you do not have Python 3.10 installed on your machine.
The script will then attempt to download the address file. This may take a few minutes. It will save the address file and a version file in a subfolder called geocoder_address_data. Under most circumstances, you should not remove this folder or any of the files in it. Doing so will cause the script to redownload the address file.
After the installation runs successfully, you are ready to set up the configuration file.
2. How to Use Address Geocoder
In order to run Address Geocoder, first set up the configuration file. By default,
Address Geocoder searchers for a file named config.yml. Detailed steps for filling out the config file are in the next section.
Configuration
- The script should make a config.yml file if no config.yml file exists. If the script did not do this, you can simply copy
config_example.ymltoconfig.ymleither in the file explorer by running in the terminal:
cp config_example.yml config.yml
Do not delete, rename, or move config_example.yml. If you delete this file, you will need to redownload it from GitHub.
In most cases, it is not recommended to delete, rename, or move config.yml. If you rename this file, the geocoder will be unable to find it and will create a new config.yml.
- Open up config.yml, and add your AIS API Key here:
AIS_API_KEY:
- Add the filepath for the input file (the file that you wish to enrich), and the geography file (the address file you have been given.) This should look something like this. If using relative filepaths, filepaths are relative to the address-geocoder-main folder downloaded from GitHub. For ease of use, exact filepaths are recommended. Do not put the filenames in quotes:
input_file: ./data/example_input_4.csv
geography_file: ./data/addresses.parquet
- Map the address fields to the name of the fields in the csv that you wish to process.
If you have one combined address field, map it to full_address_field.
Otherwise, leave full_address_field blank and map column names to street, city, state, and zip. Street must be included, while the others are optional.
Example, for a csv with the following fields:
addr_st, addr_city, addr_zip
input_file: 'example.csv'
full_address_field:
address_fields:
street: addr_st
city: addr_city
state:
zip: addr_zip
If you have both full_address_field and the address fields filled in, the script will ask you which to use.
- List which fields other than latitude and longitude you want to add.
(Latitude and longitude will always be added.) If you enter an invalid field, the program will error out and ask you to try again.
A complete list of valid fields can be found further down in this README.
enrichment_fields:
- census_tract_2020
- census_block_group_2020
- census_block_2020
The full config file should look something like this:
# Connection Credentials
AIS_API_KEY: YOUR_API_KEY
# File Config
input_file: ./data/example_input_4.csv
geography_file: ./data/addresses.parquet
full_address_field: address
# OR, IF ADDRESS IS SPLIT INTO MULTIPLE COLUMNS:
address_fields:
street:
city:
state:
zip:
# Enrichment Fields -- Aside from coordinates, what fields to add
enrichment_fields:
- census_tract_2020
- census_block_group_2020
- census_block_2020
- You're now ready to run the geocoder.
Double-click geocoder.exe -- the same file that you used to instal geocoder.
(If you get an error about a missing package, this means something didn't install properly. Try removing the address-geocoder-main folder and try again.)
The dialogue will ask you to specify a config file. Hit enter without typing anything to
keep the default config file ('./config.yml')
The output file will be saved in the same location as your input file, with _enriched attached to the filename.
Note that you may see various warnings about a USPS and election file not being found, and about SSL certification. This is to be expected.
One of the steps of the enrichment process is to check against Philadelphia's address information system (AIS). Please note that this process can take some time. It takes around 3-4 minutes to make 1,000 calls to AIS. Not all records will be checked against AIS -- just those that have no match in the addresses.parquet file.
So, it is important to provide an input file with as clean as an address field as possible, to minimize the number of times the script checks AIS.
How The Geocoder Works
Address-Geocoder processes a csv file with addresses, and geolocates those
addresses using the following steps:
- Takes an input file of addresses, and standardizes those
addresses usingpassyunk, Philadelphia's address standardization system. - Compares the standardized data to a local parquet file,
addresses.parquet,
and adds the user-specified fields as well as latitude and longitude from that file - Not all records will match to the address file. For those records that do not match,
Address-Geocoderqueries the Address Information System (AIS) API and adds returned fields.
Please note that this process can take some time, so processing large files with a messy address field
is not recommended. As an example, if you have a file that needs 1,000 rows to be sent to AIS, this will take
approximately 3-4 minutes. - The enriched file is then saved to the same directory as the input file.
Enrichment Fields
Field |
|---|
address_high |
address_low_frac |
address_low_suffix |
address_low |
bin |
census_block_2010 |
census_block_2020 |
census_block_group_2010 |
census_block_group_2020 |
census_tract_2010 |
census_tract_2020 |
center_city_district |
clean_philly_block_captain |
commercial_corridor |
council_district_2016 |
council_district_2024 |
cua_zone |
dor_parcel_id |
eclipse_location_id |
elementary_school |
engine_local |
high_school |
highway_district |
highway_section |
highway_subsection |
historic_district |
historic_site |
historic_street |
ladder_local |
lane_closure |
leaf_collection_area |
li_address_key |
li_district |
major_phila_watershed |
middle_school |
neighborhood_advisory_committee |
philly_rising_area |
planning_district |
police_district |
police_division |
police_service_area |
political_division |
political_ward |
ppr_friends |
pwd_center_city_district |
pwd_maint_district |
pwd_parcel_id |
pwd_pressure_district |
pwd_treatment_plant |
pwd_water_plate |
recycling_diversion_rate |
rubbish_recycle_day |
sanitation_area |
sanitation_convenience_center |
sanitation_district |
seg_id |
state_house_rep_2012 |
state_house_rep_2022 |
state_senate_2012 |
state_senate_2022 |
street_code |
street_light_route |
street_name |
street_postdir |
street_predir |
street_suffix |
traffic_district |
traffic_pm_district |
unit_num |
unit_type |
us_congressional_2012 |
us_congressional_2018 |
us_congressional_2022 |
zip_4 |
zip_code |
zoning_document_ids |
zoning_rco |
zoning |
Full Changelog: v1.1.1...v1.1.3
v1.1.2
address-geocoder
A tool to standardize and geocode Philadelphia addresses
Address Geocoder takes an input file containing addresses
and adds latitude and longitude to those addresses, as well as any optional
fields that the user supplies.
Note:
For more information about the geocoder, consult the GitHub repository: https://github.com/CityOfPhiladelphia/address-geocoder. The README in this repo contains more details about the matching process, and information about how to run the geocoder from the command line, if desired.
Questions?
If you have questions about the geocoder that this FAQ cannot answer, feel free to contact citygeo at: maps@phila.gov
1. Prerequisites
You will need the following things:
- An executable file called
geocoder.exe. This is used to run the program. Do not save the executable in a folder that has spaces in the name. - An AIS API key, provided to you by CityGeo.
Installation
First, you will need to download and install the geocoder.
The geocoder file can be downloaded from GitHub. The latest release can be found at: https://github.com/CityOfPhiladelphia/address-geocoder/releases/
Read through the notes carefully, and then download geocoder.exe at the bottom of the readme.
Once geocoder.exe is downloaded, move it into a folder where you can easily find it. The folder must not have spaces in its name.
Double-clicking geocoder.exe will launch the program. As a first-time installation, the script will download Python and Git if not present, then download the geocoder from GitHub and install the proper dependencies. The geocoder will be downloaded to a folder called address-geocoder-main. If there are problems with your install, you may try deleting this folder and running geocoder.exe again.
Note that this script will attempt to install Python 3.10 on your machine if you do not have Python 3.10 installed on your machine.
The script will then attempt to download the address file. This may take a few minutes. It will save the address file and a version file in a subfolder called geocoder_address_data. Under most circumstances, you should not remove this folder or any of the files in it. Doing so will cause the script to redownload the address file.
After the installation runs successfully, you are ready to set up the configuration file.
2. How to Use Address Geocoder
In order to run Address Geocoder, first set up the configuration file. By default,
Address Geocoder searchers for a file named config.yml. Detailed steps for filling out the config file are in the next section.
Configuration
- The script should make a config.yml file if no config.yml file exists. If the script did not do this, you can simply copy
config_example.ymltoconfig.ymleither in the file explorer by running in the terminal:
cp config_example.yml config.yml
Do not delete, rename, or move config_example.yml. If you delete this file, you will need to redownload it from GitHub.
In most cases, it is not recommended to delete, rename, or move config.yml. If you rename this file, the geocoder will be unable to find it and will create a new config.yml.
- Open up config.yml, and add your AIS API Key here:
AIS_API_KEY:
- Add the filepath for the input file (the file that you wish to enrich), and the geography file (the address file you have been given.) This should look something like this. If using relative filepaths, filepaths are relative to the address-geocoder-main folder downloaded from GitHub. For ease of use, exact filepaths are recommended. Do not put the filenames in quotes:
input_file: ./data/example_input_4.csv
geography_file: ./data/addresses.parquet
- Map the address fields to the name of the fields in the csv that you wish to process.
If you have one combined address field, map it to full_address_field.
Otherwise, leave full_address_field blank and map column names to street, city, state, and zip. Street must be included, while the others are optional.
Example, for a csv with the following fields:
addr_st, addr_city, addr_zip
input_file: 'example.csv'
full_address_field:
address_fields:
street: addr_st
city: addr_city
state:
zip: addr_zip
If you have both full_address_field and the address fields filled in, the script will ask you which to use.
- List which fields other than latitude and longitude you want to add.
(Latitude and longitude will always be added.) If you enter an invalid field, the program will error out and ask you to try again.
A complete list of valid fields can be found further down in this README.
enrichment_fields:
- census_tract_2020
- census_block_group_2020
- census_block_2020
The full config file should look something like this:
# Connection Credentials
AIS_API_KEY: YOUR_API_KEY
# File Config
input_file: ./data/example_input_4.csv
geography_file: ./data/addresses.parquet
full_address_field: address
# OR, IF ADDRESS IS SPLIT INTO MULTIPLE COLUMNS:
address_fields:
street:
city:
state:
zip:
# Enrichment Fields -- Aside from coordinates, what fields to add
enrichment_fields:
- census_tract_2020
- census_block_group_2020
- census_block_2020
- You're now ready to run the geocoder.
Double-click geocoder.exe -- the same file that you used to instal geocoder.
(If you get an error about a missing package, this means something didn't install properly. Try removing the address-geocoder-main folder and try again.)
The dialogue will ask you to specify a config file. Hit enter without typing anything to
keep the default config file ('./config.yml')
The output file will be saved in the same location as your input file, with _enriched attached to the filename.
Note that you may see various warnings about a USPS and election file not being found, and about SSL certification. This is to be expected.
One of the steps of the enrichment process is to check against Philadelphia's address information system (AIS). Please note that this process can take some time. It takes around 3-4 minutes to make 1,000 calls to AIS. Not all records will be checked against AIS -- just those that have no match in the addresses.parquet file.
So, it is important to provide an input file with as clean as an address field as possible, to minimize the number of times the script checks AIS.
How The Geocoder Works
Address-Geocoder processes a csv file with addresses, and geolocates those
addresses using the following steps:
- Takes an input file of addresses, and standardizes those
addresses usingpassyunk, Philadelphia's address standardization system. - Compares the standardized data to a local parquet file,
addresses.parquet,
and adds the user-specified fields as well as latitude and longitude from that file - Not all records will match to the address file. For those records that do not match,
Address-Geocoderqueries the Address Information System (AIS) API and adds returned fields.
Please note that this process can take some time, so processing large files with a messy address field
is not recommended. As an example, if you have a file that needs 1,000 rows to be sent to AIS, this will take
approximately 3-4 minutes. - The enriched file is then saved to the same directory as the input file.
Enrichment Fields
Field |
|---|
address_high |
address_low_frac |
address_low_suffix |
address_low |
bin |
census_block_2010 |
census_block_2020 |
census_block_group_2010 |
census_block_group_2020 |
census_tract_2010 |
census_tract_2020 |
center_city_district |
clean_philly_block_captain |
commercial_corridor |
council_district_2016 |
council_district_2024 |
cua_zone |
dor_parcel_id |
eclipse_location_id |
elementary_school |
engine_local |
high_school |
highway_district |
highway_section |
highway_subsection |
historic_district |
historic_site |
historic_street |
ladder_local |
lane_closure |
leaf_collection_area |
li_address_key |
li_district |
major_phila_watershed |
middle_school |
neighborhood_advisory_committee |
philly_rising_area |
planning_district |
police_district |
police_division |
police_service_area |
political_division |
political_ward |
ppr_friends |
pwd_center_city_district |
pwd_maint_district |
pwd_parcel_id |
pwd_pressure_district |
pwd_treatment_plant |
pwd_water_plate |
recycling_diversion_rate |
rubbish_recycle_day |
sanitation_area |
sanitation_convenience_center |
sanitation_district |
seg_id |
state_house_rep_2012 |
state_house_rep_2022 |
state_senate_2012 |
state_senate_2022 |
street_code |
street_light_route |
street_name |
street_postdir |
street_predir |
street_suffix |
traffic_district |
traffic_pm_district |
unit_num |
unit_type |
us_congressional_2012 |
us_congressional_2018 |
us_congressional_2022 |
zip_4 |
zip_code |
zoning_document_ids |
zoning_rco |
zoning |
Full Changelog: v1.1.1...v1.1.2
v1.1.1
address-geocoder
A tool to standardize and geocode Philadelphia addresses
Address Geocoder takes an input file containing addresses
and adds latitude and longitude to those addresses, as well as any optional
fields that the user supplies.
Note:
For more information about the geocoder, consult the GitHub repository: https://github.com/CityOfPhiladelphia/address-geocoder. The README in this repo contains more details about the matching process, and information about how to run the geocoder from the command line, if desired.
Questions?
If you have questions about the geocoder that this FAQ cannot answer, feel free to contact citygeo at: maps@phila.gov
1. Prerequisites
You will need the following things:
- An executable file called
geocoder.exe. This is used to run the program. Do not save the executable in a folder that has spaces in the name. - An AIS API key, provided to you by CityGeo.
Installation
First, you will need to download and install the geocoder.
The geocoder file can be downloaded from GitHub. The latest release can be found at: https://github.com/CityOfPhiladelphia/address-geocoder/releases/
Read through the notes carefully, and then download geocoder.exe at the bottom of the readme.
Once geocoder.exe is downloaded, move it into a folder where you can easily find it. The folder must not have spaces in its name.
Double-clicking geocoder.exe will launch the program. As a first-time installation, the script will download Python and Git if not present, then download the geocoder from GitHub and install the proper dependencies. The geocoder will be downloaded to a folder called address-geocoder-main. If there are problems with your install, you may try deleting this folder and running geocoder.exe again.
Note that this script will attempt to install Python 3.10 on your machine if you do not have Python 3.10 installed on your machine.
The script will then attempt to download the address file. This may take a few minutes. It will save the address file and a version file in a subfolder called geocoder_address_data. Under most circumstances, you should not remove this folder or any of the files in it. Doing so will cause the script to redownload the address file.
After the installation runs successfully, you are ready to set up the configuration file.
2. How to Use Address Geocoder
In order to run Address Geocoder, first set up the configuration file. By default,
Address Geocoder searchers for a file named config.yml. Detailed steps for filling out the config file are in the next section.
Configuration
- The script should make a config.yml file if no config.yml file exists. If the script did not do this, you can simply copy
config_example.ymltoconfig.ymleither in the file explorer by running in the terminal:
cp config_example.yml config.yml
Do not delete, rename, or move config_example.yml. If you delete this file, you will need to redownload it from GitHub.
In most cases, it is not recommended to delete, rename, or move config.yml. If you rename this file, the geocoder will be unable to find it and will create a new config.yml.
- Open up config.yml, and add your AIS API Key here:
AIS_API_KEY:
- Add the filepath for the input file (the file that you wish to enrich), and the geography file (the address file you have been given.) This should look something like this. If using relative filepaths, filepaths are relative to the address-geocoder-main folder downloaded from GitHub. For ease of use, exact filepaths are recommended. Do not put the filenames in quotes:
input_file: ./data/example_input_4.csv
geography_file: ./data/addresses.parquet
- Map the address fields to the name of the fields in the csv that you wish to process.
If you have one combined address field, map it to full_address_field.
Otherwise, leave full_address_field blank and map column names to street, city, state, and zip. Street must be included, while the others are optional.
Example, for a csv with the following fields:
addr_st, addr_city, addr_zip
input_file: 'example.csv'
full_address_field:
address_fields:
street: addr_st
city: addr_city
state:
zip: addr_zip
If you have both full_address_field and the address fields filled in, the script will ask you which to use.
- List which fields other than latitude and longitude you want to add.
(Latitude and longitude will always be added.) If you enter an invalid field, the program will error out and ask you to try again.
A complete list of valid fields can be found further down in this README.
enrichment_fields:
- census_tract_2020
- census_block_group_2020
- census_block_2020
The full config file should look something like this:
# Connection Credentials
AIS_API_KEY: YOUR_API_KEY
# File Config
input_file: ./data/example_input_4.csv
geography_file: ./data/addresses.parquet
full_address_field: address
# OR, IF ADDRESS IS SPLIT INTO MULTIPLE COLUMNS:
address_fields:
street:
city:
state:
zip:
# Enrichment Fields -- Aside from coordinates, what fields to add
enrichment_fields:
- census_tract_2020
- census_block_group_2020
- census_block_2020
- You're now ready to run the geocoder.
Double-click geocoder.exe -- the same file that you used to instal geocoder.
(If you get an error about a missing package, this means something didn't install properly. Try removing the address-geocoder-main folder and try again.)
The dialogue will ask you to specify a config file. Hit enter without typing anything to
keep the default config file ('./config.yml')
The output file will be saved in the same location as your input file, with _enriched attached to the filename.
Note that you may see various warnings about a USPS and election file not being found, and about SSL certification. This is to be expected.
One of the steps of the enrichment process is to check against Philadelphia's address information system (AIS). Please note that this process can take some time. It takes around 3-4 minutes to make 1,000 calls to AIS. Not all records will be checked against AIS -- just those that have no match in the addresses.parquet file.
So, it is important to provide an input file with as clean as an address field as possible, to minimize the number of times the script checks AIS.
How The Geocoder Works
Address-Geocoder processes a csv file with addresses, and geolocates those
addresses using the following steps:
- Takes an input file of addresses, and standardizes those
addresses usingpassyunk, Philadelphia's address standardization system. - Compares the standardized data to a local parquet file,
addresses.parquet,
and adds the user-specified fields as well as latitude and longitude from that file - Not all records will match to the address file. For those records that do not match,
Address-Geocoderqueries the Address Information System (AIS) API and adds returned fields.
Please note that this process can take some time, so processing large files with a messy address field
is not recommended. As an example, if you have a file that needs 1,000 rows to be sent to AIS, this will take
approximately 3-4 minutes. - The enriched file is then saved to the same directory as the input file.
Enrichment Fields
Field |
|---|
address_high |
address_low_frac |
address_low_suffix |
address_low |
bin |
census_block_2010 |
census_block_2020 |
census_block_group_2010 |
census_block_group_2020 |
census_tract_2010 |
census_tract_2020 |
center_city_district |
clean_philly_block_captain |
commercial_corridor |
council_district_2016 |
council_district_2024 |
cua_zone |
dor_parcel_id |
eclipse_location_id |
elementary_school |
engine_local |
high_school |
highway_district |
highway_section |
highway_subsection |
historic_district |
historic_site |
historic_street |
ladder_local |
lane_closure |
leaf_collection_area |
li_address_key |
li_district |
major_phila_watershed |
middle_school |
neighborhood_advisory_committee |
philly_rising_area |
planning_district |
police_district |
police_division |
police_service_area |
political_division |
political_ward |
ppr_friends |
pwd_center_city_district |
pwd_maint_district |
pwd_parcel_id |
pwd_pressure_district |
pwd_treatment_plant |
pwd_water_plate |
recycling_diversion_rate |
rubbish_recycle_day |
sanitation_area |
sanitation_convenience_center |
sanitation_district |
seg_id |
state_house_rep_2012 |
state_house_rep_2022 |
state_senate_2012 |
state_senate_2022 |
street_code |
street_light_route |
street_name |
street_postdir |
street_predir |
street_suffix |
traffic_district |
traffic_pm_district |
unit_num |
unit_type |
us_congressional_2012 |
us_congressional_2018 |
us_congressional_2022 |
zip_4 |
zip_code |
zoning_document_ids |
zoning_rco |
zoning |
Full Changelog: v1.1.01...v1.1.1
v1.1.0
What's Changed
- Bump urllib3 from 2.6.0 to 2.6.3 by @dependabot[bot] in #13
- Download address file by @CaitlinCP in #14
Full Changelog: v1.0.0...v1.1.0