Csvs To Sqlite Versions Save

Convert CSV files into a SQLite database

0.6

6 years ago

SQLite full-text search support

Added --fts option for setting up SQLite full-text search.

The --fts option will create a corresponding SQLite FTS virtual table, using the best available version of the FTS module.

https://sqlite.org/fts5.html https://www.sqlite.org/fts3.html

Usage:
```
csvs-to-sqlite my-csv.csv output.db -f column1 -f column2
```
Example generated with this option: https://sf-trees-search.now.sh/

Example search: https://sf-trees-search.now.sh/sf-trees-search-a899b92?sql=select+*+from+Street_Tree_List+where+rowid+in+%28select+rowid+from+Street_Tree_List_fts+where+Street_Tree_List_fts+match+%27grove+london+dpw%27%29%0D%0A

Will be used in https://github.com/simonw/datasette/issues/131
--fts and --extract-column now cooperate.

If you extract a column and then specify that same column in the --fts list, csvs-to-sqlite now uses the original value of that column in the index.

Example using CSV from https://data.sfgov.org/City-Infrastructure/Street-Tree-List/tkzw-k3nq
```
csvs-to-sqlite Street_Tree_List.csv trees-fts.db \
    -c qLegalStatus -c qSpecies -c qSiteInfo \
    -c PlantType -c qCaretaker -c qCareAssistant \
    -f qLegalStatus -f qSpecies -f qAddress \
    -f qSiteInfo -f PlantType -f qCaretaker \
    -f qCareAssistant -f PermitNotes
```
Closes #9
Handle column names with spaces in them.
Added csvs-to-sqlite --version option.

Using http://click.pocoo.org/5/api/#click.version_option

0.5

6 years ago

Now handles columns with integers and nulls in correctly

Pandas does a good job of figuring out which SQLite column types should be used for a DataFrame - with one exception: due to a limitation of NumPy it treats columns containing a mixture of integers and NaN (blank values) as being of type float64, which means they end up as REAL columns in SQLite.

http://pandas.pydata.org/pandas-docs/stable/gotchas.html#support-for-integer-na

To fix this, we now check to see if a float64 column actually consists solely of NaN and integer-valued floats (checked using v.is_integer() in Python). If that is the case, we over-ride the column type to be INTEGER instead.

See #5 - also a8ab524 and 0997b7b

0.3

6 years ago

Mechanism for converting columns into separate tables

Let's say you have a CSV file that looks like this:

county,precinct,office,district,party,candidate,votes
Clark,1,President,,REP,John R. Kasich,5
Clark,2,President,,REP,John R. Kasich,0
Clark,3,President,,REP,John R. Kasich,7

(Real example from https://github.com/openelections/openelections-data-sd/blob/ master/2016/20160607__sd__primary__clark__precinct.csv )

You can now convert selected columns into separate lookup tables using the new --extract-column option (shortname: -c) - for example:

csvs-to-sqlite openelections-data-*/*.csv \
    -c county:County:name \
    -c precinct:Precinct:name \
    -c office -c district -c party -c candidate \
    openelections.db

The format is as follows:

column_name:optional_table_name:optional_table_value_column_name

If you just specify the column name e.g. -c office, the following table will be created:

CREATE TABLE "party" (
    "id" INTEGER PRIMARY KEY,
    "value" TEXT
);

If you specify all three options, e.g. -c precinct:Precinct:name the table will look like this:

CREATE TABLE "Precinct" (
    "id" INTEGER PRIMARY KEY,
    "name" TEXT
);

The original tables will be created like this:

CREATE TABLE "ca__primary__san_francisco__precinct" (
    "county" INTEGER,
    "precinct" INTEGER,
    "office" INTEGER,
    "district" INTEGER,
    "party" INTEGER,
    "candidate" INTEGER,
    "votes" INTEGER,
    FOREIGN KEY (county) REFERENCES County(id),
    FOREIGN KEY (party) REFERENCES party(id),
    FOREIGN KEY (precinct) REFERENCES Precinct(id),
    FOREIGN KEY (office) REFERENCES office(id),
    FOREIGN KEY (candidate) REFERENCES candidate(id)
);

They will be populated with IDs that reference the new derived tables.

Closes #2