Skip to content

geoparquet-io

Tests Python Version License Code style: ruff

Fast I/O and transformation tools for GeoParquet files using PyArrow and DuckDB.

Features

  • Fast: Built on PyArrow and DuckDB for high-performance operations
  • Comprehensive: Sort, partition, enhance, and validate GeoParquet files
  • Spatial Indexing: Add bbox, H3 hexagonal cells, KD-tree partitions, and admin divisions
  • Best Practices: Automatic optimization following GeoParquet 1.1 spec
  • Flexible: CLI and Python API for any workflow
  • Tested: Extensive test suite across Python 3.9-3.13 and all platforms

Quick Example

# Install
pip install geoparquet-io

# Inspect file structure and metadata
gpio inspect myfile.parquet

# Check file quality and best practices
gpio check all myfile.parquet

# Add bounding box column for faster queries
gpio add bbox input.parquet output.parquet

# Sort using Hilbert curve for spatial locality
gpio sort hilbert input.parquet output_sorted.parquet

# Partition into separate files by country
gpio partition admin buildings.parquet output_dir/

Why geoparquet-io?

GeoParquet is a cloud-native geospatial data format that combines the efficiency of Parquet with geospatial capabilities. This toolkit helps you:

  • Optimize file layout for cloud-native access patterns
  • Add spatial indices for faster queries and analysis
  • Validate compliance with GeoParquet best practices
  • Transform large datasets efficiently using columnar operations

Getting Started

New to geoparquet-io? Start here:

Command Reference

  • inspect - Examine file metadata and preview data
  • check - Validate files against best practices
  • sort - Spatially sort using Hilbert curves
  • add - Enhance files with spatial indices
  • partition - Split files into optimized partitions
  • format - Apply formatting best practices

Support

License

Apache 2.0 - See LICENSE for details.