GDBee Save

:honeybee: PyQt5 desktop application for executing SQL queries against Esri file geodatabases featuring auto-completion with suggestions and syntax highlight

Project README

Total alerts Language grade: Python

Overview

GDBee is a PyQt5 desktop application which you can use to write and execute SQL queries against tables and feature classes stored inside an Esri file geodatabase. It provides a tabbed interface which lets you connect to multiple geodatabases within a single session. It has a rich code editor featuring auto-completion (with suggestions), syntax highlight, and result set export.

If you are a QGIS Desktop user, you are already able to execute SQL against file geodatabases using QGIS DBManager plugin, but GDBee has some extra features that the DBManager is missing (for instance, you do not need to add your datasets as layers first and you can choose to copy individual cells instead of the whole row) from the result table.

GDBee window

History

This project has started as an experimental playground to see what functionality PyQt provides. As I spent more time working with PyQt, I have started wondering what it would take to build a simple desktop application. Because I often find myself in need of querying a file geodatabase's datasets, I have decided to build a GUI-based SQL editor that would let me execute SQL queries against a table or a feature class and draw the result set in a table form for further visual inspection. I have thought that other GIS users and developers may find this application useful and I therefore have decided to start a GitHub repository to let others take advantage of my work.

Features

  • Working with multiple geodatabases using multiple tabs (single geodatabase connection per tab)
  • Having a schema panel showing tables and their columns for each connected geodatabase
  • Exporting result sets into various formats (WKT strings to paste into QGIS using QuickWKT plugin, arcpy code to paste into ArcMap Python window, pandas data frame via .csv file (which can be taken into geopandas), and Markdown table via .md file or plain text)
  • Executing SQL query with respect to the user selection (only selected text is executed)
  • Loading/saving SQL queries from and to text files on disk
  • Convenient keyboard shortcuts for query execution (F5 and Ctrl-Enter), tab interaction (Ctrl-N and Ctrl-W for opening and closing tabs), and browsing to a geodatabase (Ctrl-B)
  • Copying data from the result set table (either individual cell values or row(s) with the headers preserved) - ready to paste properly into an Excel sheet
  • Choosing whether you want to have geometry column in the result set as WKT
  • Choosing what SQL dialect to use for querying (OGR SQL or SQLite)
  • Auto-completion and highlights for geodatabase tables and columns as well as SQL keywords and functions
  • Pagination of the result table to load rows on request as user scrolls down
  • Reporting query execution time and number of records returned

Limitations

  • All queries should target datasets stored within a single geodatabase; you won't be able to run this query:
SELECT * FROM DB1.Parcels WHERE Parcels.ID in (SELECT ID FROM DB2.Parcels)
  • The application can be used only for selecting existing data and creating new data within the application session; you won't be able to execute any UPDATE, INSERT, or DELETE queries in the current version. If you want to save the generated shapes (for instance, as a result of buffering or getting vertices of polygons as points), you can use the export functionality to load the WKT representation of the shapes into an ArcMap or QGIS layer.

  • Depending on what SQL dialect you are using and what version of GDAL you work with, there will be some features of the SQL language you won't be able to take advantage of. For instance, using LIMIT with OGR SQL dialect is possible only starting with GDAL 2.2. Native OGR SQL does not provide any spatial SQL functions to calculate the distances between features, get their boundaries, and so on. You would need to use SQLite dialect for this. You can choose what SQL dialect to use before executing the query. To learn more about these two SQL dialects, refer to these two pages: OGR SQL and SQLite.

  • It is hard to achieve high performance of SQL querying without loading the data into a proper DBMS first. This application would not be an appropriate choice for tasks when you need to run cross joins calculating distances between million points. The GDAL's SQL execution time depends a lot on the SQL spatial functions being used. Also, getting the length of the result set returned from the database can be slow depending on whether you are querying a single table or creating a virtual view joining multiple tables together. The performance will also be different depending on what SQL dialect (OGRSQL and SQLite) is being used. You would most likely want to use the OGR SQL unless you need to use SQL spatial functions.

  • TODO compare performance of OGRSQL and SQLite

Requirements

  • Python 3.5
  • GDAL 2.1+
  • PyQt5
  • pandas
  • tabulate (optional, used for exporting result set into a markdown table)

Tested against:

  • GDAL 2.1.3 and GDAL 2.2.3
  • PyQt 5.6.0

The GDAL compilation details (can be run in the SQL query window):

select geos_version() --3.6.2-CAPI-1.10.2 999999
select proj4_version() --Rel. 4.9.3, 15 August 2016
select spatialite_version() --4.3.0-RC1

The GDAL compilation details (can be run in Python interpreter window):

osgeo.gdal.VersionInfo() # '2010400' for GDAL 2.1.3 and '2020300' for GDAL 2.2.3

Installation

You don't need to install anything apart from PyQt and GDAL (and pandas which is used for exporting) in order to be able to run the application. All of these can be installed using Anaconda Navigator which is arguably the easiest way to proceed.

I've tried various combinations to get PyQt5 and GDAL working happily together with Python 2.7 but haven't managed this. However, if you do have a Python 2.7 environment with the requirements satisfied, by all means go ahead and use it.

To run the application, just run the main.py file using the right Python interpreter.

Target audience

You may find this PyQt desktop application useful if:

  • You would like to be able to interrogate your file geodatabase datasets using SQL (instead of Python-based interface such as Esri arcpy or open-source ogr)
  • You are an ArcGIS user that does not want to have QGIS Desktop installed just to be able to execute SQL against a file geodatabase
  • You use SQL on a daily basis working with spatial databases (such as PostgreSQL or Microsoft SQL Server) and want to be able to execute ad hoc SQL queries against file geodatabase datasets without loading them into a proper DBMS database
  • You already have a lot SQL code targeting tables stored in a DBMS spatial database and you would like to be able to reuse this code when targeting a file geodatabase

Using SQL for querying file geodatabases

If you are an ArcGIS user, you have already used SQL for querying your file geodatabase datasets when using the Select By Attributes and Select By Location dialog windows or Select Layer By Attributes, Select Layer By Location, Make Feature Layer or Select geoprocessing tools. You have supplied a where clause specifying what features or rows you would like to select. This can be done simply to see what data match your query or to perform some analysis on the subset of your data. The querying functionality available through these tools is somewhat limited, though, because you will not be able to take advantage of rich SQL operations such as grouping your data, joining multiple tables, or using spatial SQL functions.

Open source GIS users have better tools for querying file geodatabases using SQL; QGIS DBManager plugin does provide an excellent interface for executing SQL queries against map layers added from a file geodatabase. However, this requires installing QGIS Desktop software and adding all the datasets you would like to participate in the SQL query into a map. Also, there are no tools to quickly export the result set in a suitable format.

Esri does provide File Geodatabase C++ API for Windows, MacOS and Linux, however it also has a number of limitations; for instance, SQL joins are not supported and you would need to have Microsoft Visual Studio for development (gcc/clang on Linux). You would also need to bind the API to your own language of choice since only .NET bindings are included.

Because Python is widely used in the GIS community, I thought it would make sense to take advantage of Python bindings of GDAL (via GEOS) to be able to connect to a file geodatabase and execute SQL queries. Working with a file geodatabase via GEOS makes it possible to take advantage of SQL spatial functions that are otherwise inaccessible to an ArcGIS user.

Examples of executing SQL queries against file geodatabase using GDAL

In this section, you find just a few sample SQL queries that have been executed in the GDBee against an Esri file geodatabase containing datasets accessible via PostGIS workshop materials by Boundless which can be downloaded from this link.

The queries and the result sets are here to show you the power of SQL when working with file geodatabases. You can learn more about using SQL for querying file geodatabases in the Esri documentation available at Use SQL for reporting and analysis on file geodatabases. The full list of the spatial and non-spatial functions available for a GDAL user is listed in the SpatialLite documentation. Nearly all SQL functions are available for a GDAL user (look for the GEOS column).

The SQL queries are executed with the help of GDAL Python bindings using the ExecuteSQL() method (mind that SQLITE dialect is being used). The result set returned is post-processed to include the abridged WKT representation of the geometries. The result sets have been converted subsequently into a markdown table for displaying in this guide.

More examples can be found in the samples document.

Non-spatial queries

Counting non-white population in ten most populated census blocks

SELECT
    BLKID
    , BORONAME
    , CAST(POPN_TOTAL AS INTEGER) AS TotalPop
    , CAST(POPN_WHITE AS INTEGER) AS WhitePop
    , CAST(POPN_TOTAL - POPN_WHITE AS INTEGER) AS NonWhitePopulation
FROM
    census_blocks
ORDER BY
    POPN_TOTAL DESC
LIMIT
    5
BLKID BORONAME TotalPop WhitePop NonWhitePopulation
0 360050001001000 The Bronx 8634 1242 7392
1 360610238011009 Manhattan 4970 2704 2266
2 360810797012000 Queens 4563 436 4127
3 360610155006000 Manhattan 4067 3507 560
4 360471034001004 Brooklyn 3918 605 3313

Find total population of each borough using HAVING

SELECT
    BORONAME
    , SUM(POPN_TOTAL) AS TotalPop
FROM
    census_blocks
GROUP BY
    BORONAME
HAVING
    SUM(POPN_TOTAL) > 1500000
ORDER BY
    TotalPop DESC
BORONAME TotalPop
0 Brooklyn 2504700.0000
1 Queens 2230621.0000
2 Manhattan 1585873.0000

Spatial queries

Find all crime incidents located within 300 feet from a certain subway station (ST_Distance)

SELECT *
FROM
    homicides
WHERE
    ST_Distance(Shape, (SELECT shape FROM subway_stations WHERE name = 'Essex St')) < 300
INCIDENT_D BORONAME NUM_VICTIM PRIMARY_MO ID WEAPON LIGHT_DARK YEAR Shape
0 2008/05/13 00:00:00 Manhattan 1 144.0000 knife D 2008.0000 POINT (585477.5157 4508066.903)
1 2003/01/12 00:00:00 Manhattan 1 ROBBERY/BURGLARY 721.0000 gun 2003.0000 POINT (585252.013 4507899.3078)
2 2003/06/10 00:00:00 Manhattan 1 DRUGS 728.0000 gun 2003.0000 POINT (585747.6144 4507957.7557)
3 2004/10/25 00:00:00 Manhattan 1 DOMESTIC 1404.0000 other 2004.0000 POINT (585688.4776 4508016.8018)
4 2005/01/27 00:00:00 Manhattan 1 ROBBERY/BURGLARY 1553.0000 gun 2005.0000 POINT (585710.8339 4508019.2805)
5 2005/08/17 00:00:00 Manhattan 1 REVENGE 1825.0000 knife 2005.0000 POINT (585404.4815 4508200.4864)
6 2007/07/06 00:00:00 Manhattan 1 DISPUTE 2834.0000 knife D 2007.0000 POINT (585305.7312 4508053.368)
7 2011/04/10 00:00:00 Manhattan 1 4171.0000 gun 2011.0000 POINT (585724.1624 4508111.4603)

Find neighborhoods with the largest number of crimes committed (count number of homicides in each neighborhood) (ST_Contains)

SELECT
    Polys.Name AS NeighborhoodName
    , Count(*) AS CrimeCount
FROM
  Homicides AS Points
JOIN
  Neighborhoods AS Polys
ON
    ST_Contains(Polys.Shape, Points.Shape) = 1
AND
    Points.YEAR in (2008, 2009, 2010)
GROUP BY
    Polys.Name
ORDER BY
    CrimeCount DESC
LIMIT 10
NeighborhoodName CrimeCount
0 Bedford-Stuyvesant 113
1 South Bronx 66
2 Brownsville 50
3 East Brooklyn 43
4 Harlem 41
5 Jamaica 37
6 Tremont 30
7 Wakefield-Williamsbridge 29
8 Mott Haven 27
9 Fort Green 25

New functionality under consideration

Issues

Did you find a bug? Do you think there is some other functionality that should be added? Please let me know by submitting an issue.

Licensing

MIT

Open Source Agenda is not affiliated with "GDBee" Project. README Source: AlexArcPy/GDBee
Stars
37
Open Issues
1
Last Commit
5 years ago
Repository
License
MIT

Open Source Agenda Badge

Open Source Agenda Rating