Uses Apache Lucene, OpenNLP and geonames and extracts locations from text and geocodes them.
A command line gazetteer built around the Geonames.org dataset, that uses the Apache Lucene library to create a searchable gazetteer.
The Geonames.org dataset contains over 10,000,000 geographical names corresponding to over 7,500,000 unique features. Beyond names of places in various languages, data stored include latitude, longitude, elevation, population, administrative subdivision and postal codes. All coordinates use the World Geodetic System 1984 (WGS84).
curl -O http://download.geonames.org/export/dump/allCountries.zip
unzip allCountries.zip
java -cp target/lucene-geo-gazetteer-<version>-jar-with-dependencies.jar edu.usc.ir.geo.gazetteer.GeoNameResolver -i geoIndex -b allCountries.txt
java -cp target/lucene-geo-gazetteer-<version>-jar-with-dependencies.jar edu.usc.ir.geo.gazetteer.GeoNameResolver -i geoIndex -s Pasadena Texas
#Launch Server
$ lucene-geo-gazetteer -server
# Query
$ curl "localhost:8765/api/search?s=Pasadena&s=Texas&c=2"
Send them to Chris A. Mattmann.
This project began as the CSCI 572 project of Yun Li on the NSF Polar CyberInfrastructure project at USC under the supervision of Chris Mattmann. You can find Yun's original code base here.
This work was sponsored by the National Science Foundation under funded projects PLR-1348450 and PLR-144562.