Skip to content

Support control over number of row groups as an option #65

@cholmes

Description

@cholmes

When converting to GeoParquet it can be useful to set more row groups, for more efficient querying on large files. See opengeospatial/geoparquet#183

GDAL's is 'ROW_GROUP_SIZE=: Defaults to 65536. Maximum number of rows per group.'

Which seems reasonable, though I was doing like 20k default size for my experiments, so we could consider having the default be less - I didn't see negative effects, but something I read said if you have lots of parquet files then smaller row group size can affect the times of getting stats on the whole set. I think I have like 500 individual parquet files, so perhaps if it's thousands or tens of thousands it comes into effect?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions