I was testing the Overture maps data and realised it is only available in parquet and not geoparquet format. As I understand it this is a user case for gpq as mentioned in #57
The tools runs fine and seems to produce output, but I cannot read this using GDAL. Apologies if this is user error or should be a GDAL issue instead - please close if this is the case.
Full steps to recreate below (note I was using gpq on a Windows machine, and testing the output on both Windows and Linux.
Download data:
aws s3 cp --region us-west-2 --no-sign-request --recursive s3://overturemaps-us-west-2/release/2023-10-19-alpha.0/theme=buildings C:\Temp\buildings.parquet
Run conversion:
$env:PATH += ";D:\Tools\gpq-windows-amd64"
gpq version
# 0.20.0
gpq convert part-00769-87dd7d19-acc8-4d4f-a5ba-20b407a79638.c000.zstd.parquet test.geo.parquet --from="parquet" --to="geoparquet"
# also tried without compression (no difference in terms of validity)
gpq convert part-00769-87dd7d19-acc8-4d4f-a5ba-20b407a79638.c000.zstd.parquet test.geo.parquet --from="parquet" --to="geoparquet" --compression="uncompressed"
gpq validate test.geo.parquet
Summary: Passed 20 checks.
✓ file must include a "geo" metadata key
✓ metadata must be a JSON object
✓ metadata must include a "version" string
✓ metadata must include a "primary_column" string
✓ metadata must include a "columns" object
✓ column metadata must include the "primary_column" name
✓ column metadata must include a valid "encoding" string
✓ column metadata must include a "geometry_types" list
✓ optional "crs" must be null or a PROJJSON object
✓ optional "orientation" must be a valid string
✓ optional "edges" must be a valid string
✓ optional "bbox" must be an array of 4 or 6 numbers
✓ optional "epoch" must be a number
✓ geometry columns must not be grouped
✓ geometry columns must be stored using the BYTE_ARRAY parquet type
✓ geometry columns must be required or optional, not repeated
✓ all geometry values match the "encoding" metadata
✓ all geometry types must be included in the "geometry_types" metadata (if not empty)
✓ all polygon geometries must follow the "orientation" metadata (if present)
✓ all geometries must fall within the "bbox" metadata (if present)
QGIS opens the file but the attribute table is empty. Testing with ogrinfo:
ogrinfo --version
# GDAL 3.7.2, released 2023/09/05
ogrinfo test.geo.parquet
Warning 1: Field brand.names.common of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field brand.names.official of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field brand.names.alternate of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field brand.names.short of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field addresses of unhandled type list<element: struct<freeform: string, locality: string, postCode: string, region: string, country: string>> ignored
Warning 1: Field names.common of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field names.official of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field names.alternate of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field names.short of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field sources of unhandled type list<element: struct<property: string, dataset: string, recordId: string, confidence: double>> ignored
INFO: Open of `test.geo.parquet'
using driver `Parquet' successful.
1: test.geo
Trying to read the data gives the likely cause of the issue: ERROR 1: ReadNext() failed: Malformed levels. min: 2 max: 2 out of range. Max Level: 1.
ogrinfo test.geo.parquet -al
Warning 1: Field brand.names.common of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field brand.names.official of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field brand.names.alternate of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field brand.names.short of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field addresses of unhandled type list<element: struct<freeform: string, locality: string, postCode: string, region: string, country: string>> ignored
Warning 1: Field names.common of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field names.official of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field names.alternate of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field names.short of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field sources of unhandled type list<element: struct<property: string, dataset: string, recordId: string, confidence: double>> ignored
INFO: Open of `test.geo.parquet'
using driver `Parquet' successful.
Layer name: test.geo
Geometry: Unknown (any)
Feature Count: 815104
ERROR 1: ReadNext() failed: Malformed levels. min: 2 max: 2 out of range. Max Level: 1
Layer SRS WKT:
GEOGCRS["WGS 84",
ENSEMBLE["World Geodetic System 1984 ensemble",
MEMBER["World Geodetic System 1984 (Transit)"],
MEMBER["World Geodetic System 1984 (G730)"],
MEMBER["World Geodetic System 1984 (G873)"],
MEMBER["World Geodetic System 1984 (G1150)"],
MEMBER["World Geodetic System 1984 (G1674)"],
MEMBER["World Geodetic System 1984 (G1762)"],
MEMBER["World Geodetic System 1984 (G2139)"],
ELLIPSOID["WGS 84",6378137,298.257223563,
LENGTHUNIT["metre",1]],
ENSEMBLEACCURACY[2.0]],
PRIMEM["Greenwich",0,
ANGLEUNIT["degree",0.0174532925199433]],
CS[ellipsoidal,2],
AXIS["geodetic latitude (Lat)",north,
ORDER[1],
ANGLEUNIT["degree",0.0174532925199433]],
AXIS["geodetic longitude (Lon)",east,
ORDER[2],
ANGLEUNIT["degree",0.0174532925199433]],
USAGE[
SCOPE["Horizontal component of 3D system."],
AREA["World."],
BBOX[-90,-180,90,180]],
ID["EPSG",4326]]
Data axis to CRS axis mapping: 2,1
Geometry Column = geometry
categories.main: String (0.0)
categories.alternate: StringList (0.0)
level: Integer (0.0)
socials: StringList (0.0)
subType: String (0.0)
numFloors: Integer (0.0)
entityId: String (0.0)
class: String (0.0)
sourceTags: String(JSON) (0.0)
localityType: String (0.0)
emails: StringList (0.0)
drivingSide: String (0.0)
adminLevel: Integer (0.0)
road: String (0.0)
isoCountryCodeAlpha2: String (0.0)
isoSubCountryCode: String (0.0)
updateTime: String (0.0)
wikidata: String (0.0)
confidence: Real (0.0)
defaultLanguage: String (0.0)
brand.wikidata: String (0.0)
isIntermittent: Integer(Boolean) (0.0)
connectors: StringList (0.0)
surface: String (0.0)
version: Integer (0.0)
phones: StringList (0.0)
id: String (0.0)
context: String (0.0)
height: Real (0.0)
maritime: Integer(Boolean) (0.0)
websites: StringList (0.0)
isSalt: Integer(Boolean) (0.0)
bbox.minx: Real (0.0)
bbox.maxx: Real (0.0)
bbox.miny: Real (0.0)
bbox.maxy: Real (0.0)
ERROR 1: ReadNext() failed: Malformed levels. min: 2 max: 2 out of range. Max Level: 1
Testing with the GDAL validate script from here
apt-get install python3-pip --fix-missing
python3 -m pip install jsonschema
python3 validate_geoparquet.py --check-data test.geo.parquet
Warning 1: Field brand.names.common of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field brand.names.official of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field brand.names.alternate of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field brand.names.short of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field addresses of unhandled type list<element: struct<freeform: string, locality: string, postCode: string, region: string, country: string>> ignored
Warning 1: Field names.common of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field names.official of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field names.alternate of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field names.short of unhandled type list<element: struct<value: string, language: string>> ignored
Warning 1: Field sources of unhandled type list<element: struct<property: string, dataset: string, recordId: string, confidence: double>> ignored
Segmentation fault
I was testing the Overture maps data and realised it is only available in parquet and not geoparquet format. As I understand it this is a user case for gpq as mentioned in #57
The tools runs fine and seems to produce output, but I cannot read this using GDAL. Apologies if this is user error or should be a GDAL issue instead - please close if this is the case.
Full steps to recreate below (note I was using gpq on a Windows machine, and testing the output on both Windows and Linux.
Download data:
Run conversion:
QGIS opens the file but the attribute table is empty. Testing with
ogrinfo:Trying to read the data gives the likely cause of the issue:
ERROR 1: ReadNext() failed: Malformed levels. min: 2 max: 2 out of range. Max Level: 1.Testing with the GDAL validate script from here