Universal solution for geospatial data tailored to data lakehouse systems for the first time in the industry
GeoLake aims at bringing geospatial support to lakehouses.
Note: We develop GeoLake atop Apache Iceberg, preserving the committed history of Apache Iceberg in the process. This retention explains the extensive contributor list on our project. Maintaining the commit history facilitates easy tracking of the changes within the Apache Iceberg project, enabling us to rebase our code to the latest version of Iceberg and ensure compatibility with its new releases.
GeoLake can be used to build a lakehouse with geospatial support. It is built on top of Apache Spark and Apache Iceberg.
-- Create table with a geometry type, as well as a spatial partition
CREATE TABLE iceberg.geom_table(
id int,
geom geometry
) USING ICEBERG PARTITIONED BY (xz2(geo, 7));
-- insert geometry values using WKT
INSERT INTO iceberg.geom_table VALUES
(1, 'POINT(1 2)'),
(2, 'LINESTRING(1 2, 3 4)'),
(3, 'POLYGON((0 0, 0 1, 1 1, 1 0, 0 0))');
-- query with spatial predicates
SELECT * FROM iceberg.geom_table
WHERE ST_Contains(geom, ST_Point(0.5, 0.5));
Check this repo docker-spark-geolake for early access, there are some notebooks inside.