Skip to content

Suggest running validate if metadata parsing fails in describe#90

Merged
tschaub merged 2 commits intomainfrom
more-describe-hints
Oct 4, 2023
Merged

Suggest running validate if metadata parsing fails in describe#90
tschaub merged 2 commits intomainfrom
more-describe-hints

Conversation

@tschaub
Copy link
Copy Markdown
Member

@tschaub tschaub commented Oct 4, 2023

This adds output to the describe command that suggests running validate if the geo metadata is invalid.

Example output:

# gpq describe invalid.geoparquet
╭──────────┬────────┬────────────┬────────────┬─────────────╮
│ COLUMN   │ TYPE   │ ANNOTATION │ REPETITION │ COMPRESSION │
├──────────┼────────┼────────────┼────────────┼─────────────┤
│ geoid    │ binary │ string     │ 0..1       │ snappy      │
│ geometry │ binary │            │ 0..1       │ snappy      │
├──────────┼────────┴────────────┴────────────┴─────────────┤
│ Rows     │ 3233                                           │
╰──────────┴────────────────────────────────────────────────╯
 ⚠️  Not a valid GeoParquet file (invalid "geo" metadata). Run describe with the --metadata-only flag 
to see the "geo" metadata value. Run validate for more detail on validation issues.

In addition, this change adds a suggestion in the describe output to run validate if the file is missing geo metadata altogether.

Example output:

# gpq describe not-geo.parquet
╭───────────────────┬────────┬────────────┬────────────┬──────────────╮
│ COLUMN            │ TYPE   │ ANNOTATION │ REPETITION │ COMPRESSION  │
├───────────────────┼────────┼────────────┼────────────┼──────────────┤
│ registration_dttm │ int96  │            │ 0..1       │ uncompressed │
│ id                │ int32  │            │ 0..1       │ uncompressed │
│ first_name        │ binary │ string     │ 0..1       │ uncompressed │
│ last_name         │ binary │ string     │ 0..1       │ uncompressed │
│ email             │ binary │ string     │ 0..1       │ uncompressed │
│ gender            │ binary │ string     │ 0..1       │ uncompressed │
│ ip_address        │ binary │ string     │ 0..1       │ uncompressed │
│ cc                │ binary │ string     │ 0..1       │ uncompressed │
│ country           │ binary │ string     │ 0..1       │ uncompressed │
│ birthdate         │ binary │ string     │ 0..1       │ uncompressed │
│ salary            │ double │            │ 0..1       │ uncompressed │
│ title             │ binary │ string     │ 0..1       │ uncompressed │
│ comments          │ binary │ string     │ 0..1       │ uncompressed │
├───────────────────┼────────┴────────────┴────────────┴──────────────┤
│ Rows              │ 1000                                            │
╰───────────────────┴─────────────────────────────────────────────────╯
 ⚠️  Not a valid GeoParquet file (missing the "geo" metadata key). Run convert to try to convert it 
to GeoParquet.

Fixes #87.

@tschaub tschaub force-pushed the more-describe-hints branch from 3f545b0 to 0998be2 Compare October 4, 2023 19:10
@tschaub tschaub merged commit a387850 into main Oct 4, 2023
@tschaub tschaub deleted the more-describe-hints branch October 4, 2023 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Better warnings / info in describe on non-compliant GeoParquet

1 participant