Cloud Optimized GeoJSON spec
COGJ is a derivative of GeoJSON that is optimized for network storage and efficient data access over HTTP using range reads. It was inspired by the Cloud Optimized GeoTiff spec and is envisioned to be a vector counterpart.
The goals are to be:
Having opaque files on remote services requires you to download the entire file before being able to make use of it. Just as COG changed this for raster, COGJ aims to do the same for vector.
The problem exists in all common vector formats:
The concept is simple: a traditional GeoJSON file is broken into n number of feature collections, each of which is independently a valid geojson document. The collections can be made using any sorting or ordering algorithm which makes sense for the given data (temporal, spatial, etc.).
The collections are arranged back-to-back in a single file with the first 10k of the file reserved for metadata. The metadata header contains metadata about the file, as a whole, as well as an array of collection metadata.
In practice, a networked client would:
The same flow would apply to the file on a disk -- just replace HTTP GET range requests with seek
and read
commands.
This demo shows 2 files on S3 which both contain the same data Cadastral.geojson and Cadastral.geojson.coj in different formats. Both are about 160mb and contain the same cadastral data in Harford County, Maryland.
The demo shows a simple OpenLayers app which lets you load either file with the click of a button. Running this in Chrome with network throttled to Fast 3G setting to emphasize the point -- and because thats the reality for many.
Reading the header:
fetch('https://s3.amazonaws.com/cogeojson/Cadastral.json.coj',{headers: {"Range":"bytes=0-9999"}})
.then(response=>{return response.json();})
Reading collections of features:
fetch('https://s3.amazonaws.com/cogeojson/Cadastral.json.coj',{headers: {"Range":"bytes="+start+"-"+end}})
.then(response => {return response.json()}).then(function(json){
//pass geojson object to your mapping library
...
}
While there is certainly some overhead related to the metadata and additional feature collection boilerplate, the impact is negligible. While the COGJ version of this test data does have a few extra curly braces, the actual file size is smaller because extra whitespace was removed. What this means is that the overhead of whitespace is larger than efficiently subdividing the file. For our test data, the COGJ is about 10mb smaller.