Ruby wrapper for the Russian FIAS database (Федеральная Информационная Адресная Система)
Ruby wrapper for the Russian ФИАС database.
Designed for usage with Ruby on Rails and a PostgreSQL backend.
Think twice before you decide to use a standalone copy of FIAS database in your project. КЛАДР в облаке could also be a solution.
Add this line to your application's Gemfile
:
gem 'fias'
And then execute:
$ bundle
Or install it yourself:
$ gem install fias
Warning! You should not run the import in a 32-bit operating system, because you're likely to get a Memory Limit exception
$ mkdir -p tmp/fias && cd tmp/fias
$ bundle exec rake fias:download | xargs wget
$ unrar e fias_dbf.rar
$ bundle exec rake fias:create_tables fias:import DATABASE_URL=postgres://localhost/fias
If you get an error "Errno::EMFILE: Too many open files @ rb_sysopen" please set ulimit 512 or more before starting rake tasks:
ulimit -S -n 512
The rake task accepts options through ENV variables:
TABLES
to specify a comma-separated list of tables to import or create. See Fias::Import::Dbf::TABLES
for the list of key names. Use houses
as an alias for HOUSE* tables and nordocs
for NORDOC* tables. In most cases you'll need only the address_objects
table.PREFIX
for database tables prefix ('fias_' by default).FIAS_PATH
to specify DBF files location ('tmp/fias' by default).DATABASE_URL
to set database credentials (required explicitly even with a Ruby on Rails project).This gem uses COPY FROM STDIN BINARY
to import data. At the moment it works with PostgreSQL only.
ancestry
or closure_tree
gems to navigate through record tree.Every FIAS address object has two fields: formalname
, which holds the toponym (the name of a geographical object) and shortname
, which holds its type (street, city, etc.). FIAS contains the list of all available shortname
values and their corresponding long forms in the address_object_types
table (SOCRBASE.DBF).
In real life people use a lot of type name variations. For example, 'проспект' can be written as 'пр' or 'пр-кт'.
You can convert any variation to a canonical form:
Fias::Name::Canonical.canonical('поселок')
# => [
# 'поселок', # FIAS canonical full name
# 'п', # FIAS canonical short name (as in address_objects table)
# 'п.', # Short name with dot if needed
# 'пос', # Alias
# 'посёлок' # Alias
# ]
See fias.rb for a list of settings.
Use Fias::Name::Append
to build toponym names in conformity with the rules of grammar:
Fias::Name::Append.append('Санкт-Петербург', 'г')
# => ['г. Санкт-Петербург', 'город Санкт-Петербург']
Fias::Name::Append.append('Невский', 'пр')
# => ['Невский пр-кт', 'Невский проспект']
Fias::Name::Append.append('Чечня', 'республика')
# => ['Респ. Чечня', 'Республика Чечня']
Fias::Name::Append.append('Чеченская', 'республика')
# => ['Чеченская Респ.', 'Чеченская Республика']
You can pass any form of type name: full, short, an alias, with or without the dot.
Sometimes you need to extract a toponym and its type from a plain string:
Fias::Name::Extract.extract('Город Санкт-Петербург')
# => ['Санкт-Петербург', 'город', 'г', 'г.']
Fias::Name::Extract.extract('ул. Казачий Вал')
# => ['Казачий Вал', 'улица', 'ул', 'ул.']
Sometimes street names come mixed up with house numbers, and you need to extract the house number from a string to clean it up for indexing:
Fias::Name::HouseNumber.extract('Ново-Садовая ул,303а')
# => ['Ново-Садовая ул', '303а']
Fias::Name::HouseNumber.extract('пр.Энергетиков 72/2')
# => ['пр.Энергетиков', '72/2']
Given you have a set of structured addresses:
[
{ region: 'Еврейская АОбл', city: 'г. Биробиджан', street: 'Шолом-Алейхема' },
{ city: 'Санкт-Петербург', street: 'Лермонтовский проспект' }
]
You need to find a FIAS item for each address in set.
Your project may use a full-text search engine (Sphinx, ElasticSearch) or just a SQL database. Search principles are the same, but the implementation would differ. This library contains helpful modules and base classes to facilitate searching.
Each toponym consists of words; some of them are considered "special". Said "special" words could have synonyms or different forms, they could be skipped by user or could be written differently in FIAS database itself.
Examples:
You should trait them as equal when performing search.
Note that we are talking about toponym names with types extracted (see type extraction above).
Words are split according to a set of simple rules aimed to simplify disclosure of synonyms and determination of optional parts.
Addressing::Name::Split.split("50 лет Октября")
# => ["50 лет", "октября"]
Addressing::Name::Split.split("Ю.Р.Г.Эрвье")
# => ["ю.р.г.", "эрвье"]
Given we have a street named им. академика И.П.Павлова
in FIAS, most people will reference it as just Павлова
street, some will write it as имени Павлова
, and some - академика Павлова
. Basically, nobody except the FIAS database would reference it by the exact original name.
Addressing::Name::Synonyms.expand('им. академика И.П.Павлова')
# => [["им", "имени", "им.", ""],
# ["ак.", "академика", ""],
# ["и.п.", ""],
# ["павлова"]]
Will return all possible forms for each word. Empty strings here mark optional words.
Addressing::Name::Synonyms.tokens('им. академика И.П.Павлова')
# => ["им", "имени", "им.", "ак.", "академика", "и.п.", "павлова"]
Will return flat array with all words.
You can also calculate all possible name combinations:
Addressing::Name::Synonyms.forms('им. И.П.Павлова')
# => [
# 'и.п. им павлова',
# 'им павлова',
# 'и.п. имени павлова',
# 'имени павлова',
# 'и.п. им. павлова',
# 'им. павлова',
# 'и.п. павлова',
# 'павлова'
# ]
In search index you need:
Fias::Name::Synonyms.tokens
)Fias::Name::Synonyms.forms
)See indexing example.
Performing a search will execute these three steps:
We'll use the sequel
gem in this example.
class Query
include Fias::Query
def find(tokens)
return [] if tokens.blank? # Empty array has no type, Sequel fails.
op = Sequel.pg_array_op(:tokens)
DB[:address_objects]
.select(:id, :name, :abbr, :parent_id, :ancestry, :forms, :tokens)
.where(op.overlaps(tokens))
.to_a
end
end
#find
accepts splitted object name (a result of Fias::Name::Split.split
). It searches all address objects with their tokens matching a given set of tokens. It returns an array of hashes with keys you can see above.
:abbr
- FIAS shortname value.:ancestry
- array of ancestor IDs.:forms
- object name forms (Fias::Name::Synonyms.forms
):tokens
- object name tokens (Fias::Name::Synonyms.tokens
)query = Query.new(
region: 'Еврейская АОбл', city: 'г. Биробиджан', street: 'Шолом-Алейхема'
)
query.params.sanitized
# => {
# :region => ["Еврейская", "автономная область", "Аобл", "Аобл"],
# :city => ["Биробиджан", "город", "г", "г."],
# :street => ["Шолом-Алейхема"]
# }
Allowed params are: %i(region district city subcity street)
query.perform
#
# [[13213, {:id=>72344, :name=>"Шолом-Алейхема", :abbr=>"ул", :parent_id=>184027, :ancestry=>[184027, 12550], :forms=>["шолом-
# алейхема"], :tokens=>["шолом-алейхема"], :key=>:street}]]
Result is array.
Special thanks to @gazay.
git checkout -b my-new-feature
)git commit -am 'Add some feature'
)git push origin my-new-feature
)The MIT License